...
Current state: "Under Discussion"
Discussion thread: Google Doc https://lists.apache.org/thread/54x1pz3qm0jh3ncsrjonp6t09o3r50z2
JIRA: TBD
Released: TBD
Google Doc handy
Motivation
Apache Thrift (along with protocol-buf ) is widely adopted as a de facto standard of high throughput network traffic protocol. Historically, Companies like Pinterest have been utilizing thrift to encode strongly typed Kafka messages, and persist to object storage as sequence files in the data warehouse.
...
In order to support FlinkSQL workload reading from the Kafka topic, we proposed following data type mapping from Thrift Type System to Flink Row type system. We are in favor of debugging and user readability as we map enum to string type.
bool | DataTypes.BOOLEAN() |
byte | DataTypes.TINYINT() |
i16 | DataTypes.SMALLINT() |
i32 | DataTypes.INT() |
i64 | DataTypes.BIGINT() |
double | DataTypes.DOUBLE() |
string | DataTypes.STRING() |
enum | DataTypes.STRING() |
list | DataTypes.ARRAY() |
set | DataTypes.ARRAY() |
map | DataTypes.MAP() |
struct | DataTypes.Row() |
publicfinalclassTType {
publicstaticfinalbyteSTOP=0;
...
Example of index matching can be found below
Xtruct3 string_thing = “boo” changed = 0 i32_thing = unset i64_thing = -1 | Row <”boo”,0,null, -1> |
Note, from runtime performance consideration, we propose having a type to sort field information cache.
...
11: i64 i64_thing,
12: string new_thing
}
Xtruct3 string_thing = “boo” changed = 0 i32_thing = unset i64_thing = -1 | Row <”boo”,0,null, -1, null> |
Handling Nested Field in Row
...