Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We would like to accomplish the following:

  • Move clients to Java to fix scala problems
    • Javadoc
    • Scala version non-compatability
    • Readability by non-scala users
    • Scary stack traces
    • Leakage of scala classes/interfaces into java api
  • Code cleanup and embeddability
    • Both
    Code cleanup: both
    • producer and consumer code are extremely hard to understand
    • Redo the request serialization layer to avoid all the custom request definition objects
    • Eliminate the "simple" consumer api and have only a single consumer API with the capabilities of both
    • Remove all threads from the consumer
    • Have a separate client jar with no depedencies
  • Generalize APIs
      Generalize the producer API
      • Producer
          • Give back a return value containing error code, offset, etc
        Generalize the consumer API
        • Consumer
          • Enable static partition assignment for stateful data systems
          • Enable
        • Handle consuming from known partitions
        • Handle
          • consumer-driven offset changes.
          Get rid of the need for simple consumer
      • Better support non-java consumers
        • Move to a high-level protocol for consumer group management to centralize complexity on the server for all clients
      • Improve performance and operability
        • Make the producer fully async to
        improve performance by
        • to allow issuing sends to all brokers simultaneously and having multiple in-flight requests simultaneously.
      • Redo the request serialization layer to avoid all the custom request definition objects
      • Remove all threads from the consumer
      • Remove all library dependencies from clients to avoid version clashes
      • Create an RPC-based protocol for partition assignment to replace the direct zookeeper usage as prototyped here.
        • This will allow a number of simplifications
          • Centralization of all complexity on the server side (clients get easier to write in all languages)
          • Easier for languages with poor zk compatibility
          • More scalable with # of partitions
          • Easier to get correctness under partial-failure conditions
      • Modularize the clients so that the client and server do not share a jar
        • This will dramatically reduce the impact of latency on throughput (which is important with replication).
        • Move to server-side offset management will allow us to scale this facility which is currently a big scalability problem for high-commit rate consumers due to zk non scalability.
        • Server-side group membership will be more scalable with number of partitions then the current consumer co-ordination protocol
        Rewrite the clients in Java
        • Though scala is a nice language it has proven to be a painful dependency for people wanting to integrate the client. The server would remain in scala for our convenience, but the clients would move to java. The following are the scala complaints:
        • Bad stacktraces
        • Leakage of scala classes into the Java api
        • Non-existant scala compatibility (binary compatibility breaks every 6 months)
        • Hard to get javadocs
        • People can't read the code--ideally we want people using the client to be able to read the client code.

      The idea would be to roll out the new api as a separate jar, leaving the existing client intact but deprecated for one or two releases before removing the old client. This should allow a gradual migration.

      ...