Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Installing Impala. Focuses largely on installing through Cloudera Manager. Perhaps we'll rewrite this from scratch.
  • Impala security. Split between CDH-related discussions of authentication, authorization, auditing, and so on rather than grouped all together.
    • Authentication with Kerberos and LDAP should be fairly generic.
    • Authorization with Sentry relies on another Apache project and should have relatively few CDH dependencies.
    • Auditing is related to Cloudera Navigator. We can document the "hooks" in the Apache context, but the usage instructions for auditing are probably tool-specific.
  • Configuration. The Apache context might offer the opportunity to flesh out this material some more, for example to do a comprehensive list of all the startup flags for all the daemons. (Many of the flags are intended for debugging and diagnosing during Impala development, and so are more appropriate for developer-centric docs than user-facing docs.)
  • JDBC and ODBC. Currently, this info mainly covers the Cloudera-specific drivers and is mostly targeted towards administrators. (How to install the drivers etc.) In the Apache context, we could flesh out the developer-oriented docs around writing JDBC and ODBC applications.
  • Upgrading Impala. Again, a lot of material about the Cloudera Manager path for doing upgrades.
  • Starting/stopping the daemons. Again, targeted mainly towards administrators. Apache context might offer opportunity to dive deeper into daemon internals and troubleshooting.
  • Release notes. Currently the Impala new features, known issues, fixed issues, and incompatible changes are scattered among the CDH release notes for all the components. We have always kept the focus on the IMPALA- JIRA issues on the public tracking system, so the release notes should hang together pretty well when we centralize them again.

Authoring Logistics

Source of the main Impala documentation (SQL Reference and such) is in XML, using the DITA XML format and buildable by an open source toolchain.

Version control has been done by git for some time now, so source files should slot nicely into git in the Apache project and the authoring workflow should stay mostly the same.

The Impala-related docs are relatively self-contained. They have few if any direct xrefs to non-Impala topics that would cause build breakage when built outside the CDH library; any such references are typically http:// links that can be conditionalized out or removed for Apache purposes. We kept the ability to publish a standalone Impala library even after merging the Impala docs into the big CDH library (which happened with the Impala 2.0 release, which became part of CDH 5.2).