Status

Current state: "Under Discussion"

Discussion thread: Google Doc

JIRA: TBD

Released: TBD

Motivation

Apache Thrift (along with protocol-buf ) is widely adopted as a de facto standard of high throughput network traffic protocol. Historically, Companies like Pinterest have been utilizing thrift to encode strongly typed Kafka messages, and persist to object storage as sequence files in the data warehouse.

Major benefits of this approach were that

Versioned thrift schema files served as a schema registry where producers and consumers across languages could encode/decode with the latest schema.
Minimize overhead of maintaining translation ETL jobs which flatten schema or adding additional accessory fields during ingestion
Lower storage footprint

Other than missing out optimization comes with storage format conversion, running jobs against unsupported thrift format also poses a challenge of maintenance and upgrades flink jobs given

lack of backward-compatible thrift encoding/decoding support in Flink
lack of inference Table schema DDL support

Proposed Changes

Compatibility, Deprecation, and Migration Plan

Test Plan

Describe in few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

Page tree

FLIP-237: Thrift Format Support