Page properties | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Table of Contents
Background and Motivation
...
The SQL is the most popular API of Apache Flink user that connects so many other engines(i.e. Apache Hive, MySQL, PostgreSQL). Flink would have better compatibility with sql query to the underlying engines it adapters to if it has had the built-in support for implicit type coercion.
There are already some JIRA issues and user mails that are relative with related this topic more or less.
Use Cases of Type Coercion:
- the inter-operation within builtin SQL operators, like binary arithmetic operations (PLUS, MINOUS and so on), binary comparison, IN, CASE WHEN, IF, SET operator (UNION, INTERSECT ..)
- the built in functions type coercion for input parameters
- the SQL INSERT source that target connector row type when are
- there are row elements type mismatch
- mismatches
- the compatibility with other popular DB engines, like MySQL/Hive/PostgreSQL and so on
Popular DB Type Conversion Rules
...
Data type conversion behaviors for MS-SQL
Proposed Design
...
The Common Type
The whole rules to find the proper conversion type:
...
We try the best to find the tightest common type that does not lose precision, for example DOUBLE and INT return DOUBLE, DATE and TIMESTAMP return TIMESTAMP; If we can not find the tightest common type, then try to coerce all the operands to VARCHAR type which may lose some precision(or 2 DECIMALs with wider precision/scale).
Conversion
...
Context and
...
Strategy
SQL Contexts | Expression subtype | Strategies |
Set Operation | union/except/intersect | compare the data type of each branch row to find the common type of each fields pair |
Arithmetic Expression | binary arithmetic: [+, -, &, |, ^, /, %, pmod] | 1. promote string operand to data type of the other numeric operand; 2. two strings would all be coerced to DECIMAL. |
binary comparison: [=, <, <=, >, >=, <>] | 1. promote string and timestamp to timestamp; 2. make 1=true and 0=false always evaluates true; 3. find common type for both operands if there is numeric type operand. | |
IN Expression | with subquery | compare type of LHS and RHS, find the common type, if it is struct type, find wider type for every field |
without subquery | if RHS is a expr list, compare every expr to find the wider type | |
Special AGG Function | promote string all to decimal type | |
Case When Expression | case when expression | find then and else operands common wider type |
Colesce FUNC | same as case when | |
[Date|Timestamp|String] +/- interval | promote string to timestamp | |
Function with Expected Inputs Type | builtin functions | look up the families registered in the operand type checker, find the family default type if rule allows it |
UDF/UDAF | try to coerce based on the argument operands types of eval() func |
...
For example, for binary arithmetic with strings, we coerce the string operand to the type of the other numeric operand: for '9' / 2 (INT), we coerce '9' to int type, and the result type is also INT (like PostgreSQL and MS-SQL did). While MySQL and Oracle would coerce all the string operands to DOUBLE.
Another case is the IN operator, we see the IN operands comparison same with the binary comparison: for 1 in ('abc', 2), Apache Flink coerce the 'abc' to INT following PostgreSQL and MS-SQL, while MySQL and Apache Spark coerces 1 and 2 to Char(3).
Type Conversion Matrix
The table below illustrates the implicit type coercion rules for all kinds of engines:
...