Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current state: "Under Discussion"

Discussion thread: TBD

Widget Connector
urlhttps://docs.google.com/document/d/1EhHewAW39pm-TX6fuUZLogWHK7vtgJ616WRwH7OOrgg/edit#

JIRA: TBD

Released: TBD

Motivation

Apache Thrift (along with protocol-buffers buf ) is widely adopted as a de facto standard of high throughput network traffic protocol. Historically, Companies like Pinterest has have been utilizing thrift to encode strongly typed Kafka messages as well as HDFS , and persist to object storage as sequence files in the data warehouse. On one hand, versioned  

Major benefits of this approach were that  

  • Versioned thrift schema files

...

  • served as a schema registry where producers and consumers across languages could encode/decode with the latest schema.
  • Minimize overhead of maintaining translation ETL jobs which flatten schema or adding additional accessory fields during ingestion
  • Lower storage footprint


Other than missing out optimization comes with storage format conversion, running jobs against unsupported thrift format It also poses a challenge of maintenance and upgrades have given the flink jobs given 

  • lack of backward-compatible thrift encoding/decoding support in Flink

...

  • lack of inference Table schema DDL

...

  • support 

Proposed Changes

After multiple large-scale high-tier production use cases, we propose to work closely with community and industry peers to upstream our patches. Including but not limited to

  • efficient partial thrift encoding/decoding format, make room for future columns pruning optimization in both streaming and batch
  • backward/forward compatible thrift binary <-> Row converter that can handle highly nested sophisticated typed schema definition without worry about upstream schema changes or state restore
  • extend Table DDL and potentially View DDL to inference schema

Public Interfaces

TBD

Proposed Changes

Describe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.

Compatibility, Deprecation, and Migration Plan

...

Test Plan

Describe in few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

...