Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current state: "Under Discussion"

Discussion thread: Google Doc https://lists.apache.org/thread/54x1pz3qm0jh3ncsrjonp6t09o3r50z2

JIRA: TBD

Released: TBD

Google Doc handy

Motivation

Apache Thrift (along with protocol-buf ) is widely adopted as a de facto standard of high throughput network traffic protocol. Historically, Companies like Pinterest have been utilizing thrift to encode strongly typed Kafka messages, and persist to object storage as sequence files in the data warehouse. 

...

In order to support FlinkSQL workload reading from the Kafka topic, we proposed following data type mapping from Thrift Type System to Flink Row type system. We are in favor of debugging and user readability as we map enum to string type.

bool

DataTypes.BOOLEAN()

byte

DataTypes.TINYINT()

i16

DataTypes.SMALLINT()

i32

DataTypes.INT()

i64

DataTypes.BIGINT()

double

DataTypes.DOUBLE()

string

DataTypes.STRING()

enum

DataTypes.STRING()

list

DataTypes.ARRAY()

set

DataTypes.ARRAY()

map

DataTypes.MAP()

struct

DataTypes.Row()

publicfinalclassTType {

 publicstaticfinalbyteSTOP=0;

...

Example of index matching can be found below

Xtruct3 

   string_thing = “boo”

   changed = 0

   i32_thing = unset

   i64_thing = -1

Row

<”boo”,0,null, -1>

Note, from runtime performance consideration, we propose having a type to sort field information cache.

...

  11: i64    i64_thing,

  12: string new_thing

}

Xtruct3 

   string_thing = “boo”

   changed = 0

   i32_thing = unset

   i64_thing = -1

Row

<”boo”,0,null, -1, null>

Handling Nested Field in Row

...