Status
Current state: "Under Discussion"
Discussion thread: TBD
JIRA: TBD
Released: TBD
Motivation
Apache Thrift (along with protocol-buf ) is widely adopted as a de facto standard of high throughput network traffic protocol. Historically, Companies like Pinterest have been utilizing thrift to encode strongly typed Kafka messages, and persist to object storage as sequence files in the data warehouse.
Major benefits of this approach were that
- Versioned thrift schema files served as a schema registry where producers and consumers across languages could encode/decode with the latest schema.
- Minimize overhead of maintaining translation ETL jobs which flatten schema or adding additional accessory fields during ingestion
- Lower storage footprint
Other than missing out optimization comes with storage format conversion, running jobs against unsupported thrift format also poses a challenge of maintenance and upgrades flink jobs given
- lack of backward-compatible thrift encoding/decoding support in Flink
- lack of inference Table schema DDL support
Proposed Changes
Compatibility, Deprecation, and Migration Plan
Test Plan
Describe in few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?
Rejected Alternatives
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.