When the configuration columnar.encoding is set to a value other than 'none,' the producer needs to change by calling the new API setSchema() to add a schema. The consumer, without the proposed change, should still be able to poll the records as before. Optionally, the consumer can call the new method pollBuffer() to get the segment in the columnar encoding format.

Test Plan

...

Functional tests

Regression test - The configuration ‘columnar.encoding’ is set to 'none’, run all the tests in Uber staging env which includes but not limited to read/write with different scale.
Added feature - The configuration ‘columnar.encoding’ is set to 'parquet’.

Verify the data is encoded as Parquet format
The producer, broker and consumer all work as before functionality-wide. No exceptions are expected.
The newly added consumer API should be able to return the whole segment as Parquet format directly.

Performance tests

Run tests for different topics that should have different data types and scale

Benchmarking the data size when the number of rows is changed in the batch.
Benchmarking CPU utilization on producer and consumer

Compatibility tests

Test the compatibility among producer, consumer, and replicator with/without the proposed changes.

Both producer and consumer have the proposed changes

The feature is turned off in configuration, all all regression tests should work as before.
When the producer turns on this feature, the consumer and replicator can consume as before.

Producer has the proposed changes, but the consumer doesn’t

The feature is turned off in configuration, all all regression tests should work as before.
When the producer turn on this feature, the consumer and replicator throw an exception

Producer doesn’t have the proposed changes, but the consumer does

All the regression tests should pass

Rejected Alternatives

The alternative is to apply columnar encoding and compression outside Kafka clients. The application can add a buffer to create a batch for records, apply columnar encoding and compression, and then put them into the (K, V) in the ProducerRecord. The benefit of doing this is to avoid changes in the Kafka client, but there are problems with this approach, as outlined below:

...

Space shortcuts

Child pages

Versions Compared

Old Version 2

New Version 3

Key

Test Plan

Functional tests

Performance tests

Compatibility tests

Rejected Alternatives

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 2

New Version 3

Key

Test Plan

Functional tests

Performance tests

Compatibility tests

Rejected Alternatives