Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note this is a joint work proposed by Xinli Shang Qichao Chu Zhifeng Chen @Yang Yang 

Motivation

Kafka is typically used in conjunction with Avro, JSON, or Protobuf etc to serialize/deserialize data record by record. In the producer client, records are buffered as a segment (record batch), and compression is optionally applied to the segment. When the number of records in each segment is larger, columnar storage like Apache Parquet becomes more efficient in terms of compression ratio, as compression typically performs better on columnar storage. This results in benefits for reducing traffic throughput to brokers/consumers and saving disk space on brokers.

...