Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The columnar encoding type for the data generated for a topic by the producer. The default is none (i.e., no columnar encoding). Valid values are Parquet for now and future to include ORC. When the value is not none, the key.serializer and value.serializer are is disabled because the column encoder will serialize and deserialize the data. Column encoding is for full batches of data, so the efficacy of batching will also impact the efficiency of columnar encoding (more batching means better encoding).

...

  1. Regression test - The configuration ‘columnar.encoding’ is set to 'none’, run all the tests in Uber staging env which includes but is not limited to read/write with different scalescales
  2. Added feature - The configuration ‘columnar.encoding’ is set to 'parquet’.
    1. Verify the data is encoded as Parquet format 
    2. The producer, broker, and consumer all work as before functionality-wide. No exceptions are expected. 
    3. The newly added consumer API should be able to return the whole segment as Parquet format directly. 

...

  1. Both producer and consumer have the proposed changes 
    1. The feature is turned off in configuration, all all regression tests should work as before.
    2. When the producer turns on this feature, the consumer and replicator can consume as before. 
  2. Producer has the proposed changes, but the consumer doesn’t 
    1. The feature is turned off in configuration, all all regression tests should work as before.
    2. When the producer turn turns on this feature, the consumer and replicator throw an exception 
  3. Producer doesn’t have the proposed changes, but the consumer does 
    1. All the regression tests should pass 

...