You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

This is a page to record current status, plans, and work items related to the Impala documentation in the context of the Apache incubation process. All subject to change based on guidance from the Apache mentors and the Impala community - for the moment the content here is mainly "thinking out loud" on the part of John, who has been sole author of the Impala docs up to now.

Historically, the Impala documentation has lived on the Cloudera web site. For reference until we disentangle the Impala docs from the CDH-specific ones, here is a link to the Impala doc starting page on cloudera.com.

Most Impala-related subjects are covered underneath that starting page - you can see them by expanding the "Impala" branch of the navigation tree on the left.

A few Impala-related topics are scattered elsewhere throughout the Cloudera docs. These tend to have more CDH and Cloudera Manager dependencies and will probably require more work to disentangle:

  • Installing Impala. Focuses largely on installing through Cloudera Manager. Perhaps we'll rewrite this from scratch.
  • Impala security. Split between CDH-related discussions of authentication, authorization, auditing, and so on rather than grouped all together.
    • Authentication with Kerberos and LDAP should be fairly generic.
    • Authorization with Sentry relies on another Apache project and should have relatively few CDH dependencies.
    • Auditing is related to Cloudera Navigator. We can document the "hooks" in the Apache context, but the usage instructions for auditing are probably tool-specific.
  • Configuration. The Apache context might offer the opportunity to flesh out this material some more, for example to do a comprehensive list of all the startup flags for all the daemons. (Many of the flags are intended for debugging and diagnosing during Impala development, and so are more appropriate for developer-centric docs than user-facing docs.)
  • JDBC and ODBC. Currently, this info mainly covers the Cloudera-specific drivers and is mostly targeted towards administrators. (How to install the drivers etc.) In the Apache context, we could flesh out the developer-oriented docs around writing JDBC and ODBC applications.
  • Upgrading Impala. Again, a lot of material about the Cloudera Manager path for doing upgrades.
  • Starting/stopping the daemons. Again, targeted mainly towards administrators. Apache context might offer opportunity to dive deeper into daemon internals and troubleshooting.
  • Release notes. Currently the Impala new features, known issues, fixed issues, and incompatible changes are scattered among the CDH release notes for all the components. We have always kept the focus on the IMPALA- JIRA issues on the public tracking system, so the release notes should hang together pretty well when we centralize them again.

Authoring Logistics

Source of the main Impala documentation (SQL Reference and such) is in XML, using the DITA XML format and buildable by an open source toolchain.

Version control has been done by git for some time now, so source files should slot nicely into git in the Apache project and the authoring workflow should stay mostly the same.

The Impala-related docs are relatively self-contained. They have few if any direct xrefs to non-Impala topics that would cause build breakage when built outside the CDH library; any such references are typically http:// links that can be conditionalized out or removed for Apache purposes. We kept the ability to publish a standalone Impala library even after merging the Impala docs into the big CDH library (which happened with the Impala 2.0 release, which became part of CDH 5.2).

 

  • No labels