Status
Page properties |
---|
Status:
...
|
...
JIRA:
...
|
...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Table of Contents |
---|
Motivation
Currently some temporal function behaviors are weird to users.
...
As we knew some functions' behavior is wrong currently, but after we correct these function, the we plan to drop legacy behavior should still work in old code.
This FLIP introduce an option for compatibility consideration, given two option names, I prefer the first one, the second one is like Impala style[1].
table.exec.fallback-legacy-time-function = false/true
table.exec.use-utc-for-unixtime-conversion = false/true
Proposed Changes
immediately.
Proposed Changes
1. Correct 1. Correct the return value of time functions
...
function | Flink current behavior | Flink proposed changes | other SQL vendors' behavior |
CURRENT_TIMESTAMP | return type: TIMESTAMP #session timezone: UTC | return type: TIMESTAMP WITH LOCAL TIME ZONE #session timezone: UTC | In MySQL, Spark, the function NOW() and CURRENT_TIMESTAMP return current timestamp value in session time zone,the return type is TIMESTAMP In Pg, Presto, the function NOW() and CURRENT_TIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH TIME ZONE In Snowflake the function CURRENT_TIMESTAMP/LOCALTIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH LOCAL TIME ZONE |
...
This FLIP proposes option 1 which changes the return type, the proposed changes as following, the current wall-clock is 2020-12-29 07:52:52 in Beijing time(UTCGMT+808:00):
function | existed problem | current behavior | proposed changes |
CURRENT_DATE | returns UTC date, but user expects current date in session time zone | return type: DATE #session timezone: UTC 2020-12-28 #session timezone: UTCGMT+808:00 2020-12-28 | return current date in session time zone, the return type should be DATE #session timezone: UTC 2020-12-28 #session timezone: UTCGMT+808:00 2020-12-29 |
CURRENT_TIME | returns UTC time, but user expects current time in session time zone | return type: TIME #session timezone: UTC 23:52:52 #session timezone: UTCGMT+808:00 23:52:52 | return current time in session time zone, the return type should be TIME #session timezone: UTC 23:52:52 #session timezone: UTCGMT+808:00 07:52:52 |
CURRENT_TIMESTAMP | returns UTC timestamp, but user expects current timestamp in session time zone | return type: TIMESTAMP #session timezone: UTC 2020-12-28 23:52:52 #session timezone: UTCGMT+808:00 2020-12-28 23:52:52 | return current timestamp in session time zone, the return type should be TIMESTAMP WITH LOCAL TIME ZONE #session timezone: UTC 2020-12-28 23:52:52 #session timezone: UTCGMT+808:00 2020-12-29 07:52:52 |
NOW() | returns UTC timestamp, but user expects current timestamp in session time zone | return type: TIMESTAMP #session timezone: UTC 2020-12-28 23:52:52 #session timezone: UTCGMT+808:00 2020-12-28 23:52:52 | return current timestamp in session time zone, the return type should be TIMESTAMP WITH LOCAL TIME ZONE #session timezone: UTC 2020-12-28 23:52:52 #session timezone: UTCGMT+808:00 2020-12-29 07:52:52 |
PROCTIME() | returns UTC timestamp, but user expects current timestamp in session time zone | return type: TIMESTAMP *PROCTIME* #session timezone: UTC 2020-12-28 23:52:52 #session timezone: UTCGMT+808:00 2020-12-28 23:52:52 | return current timestamp in session time zone for PROCTIME(), the return type should be TIMESTAMP WITH LOCAL TIME ZONE *PROCTIME* #session timezone: UTC 2020-12-28 23:52:52 #session timezone: UTCGMT+808:00 2020-12-29 07:52:52 |
After the proposal, the function NOW() and CURRENT_TIMESTAMP become synonyms, the function CURRENT_TIME and LOCALTIME become synonyms, you can also lookup all time function behaviors in reference [2].
...
function | current behavior | existed problem | migration plan |
CAST(44 AS TIMESTAMP) | TIMESTAMP(0) NOT NULL #session timezone: UTC 1970-01-01 00:00:44 #session timezone: UTCGMT+808:00 1970-01-01 00:00:44 | The time in BIGINT type usually represents a unixtime semantic, which represents the elapsed time since java epoch(1970-01-01 00:00:00 UTC+0), when convert to a timestamp we should consider local time zone | This is an invalid behavior, disable the invalid CAST behavior, to get same behavior, user can workaround with: #session timezone: UTC
1970-01-01 00:00:44 #session timezone: UTCGMT+808:00 TO_TIMESTAMP(FROM_UNIXTIME(44 - 8 * 60 * 60)) 1970-01-01 00:00:44 |
CAST(TIMESTAMP ‘1970-01-01 00:00:44’ AS BIGINT) | BIGINT NOT NULL #session timezone: UTC 44 #session timezone: UTCGMT+808:00 44 | The inverse conversion of above, this conversion is used rarely. | UNIX_TIMESTAMP(TIMESTAMP ‘1970-01-01 00:00:44’) #session timezone: UTC 44 #session timezone: UTCGMT+808:00 -28756 |
3.
...
Disable legacy behavior of theses time functions
The return values of functions
...
The default value of table.exec.fallback-legacy-time-function is 'false' which means:
(1) the return values of function
- CURRENT_DATE
- CURRENT_TIME
- CURRENT_TIMESTAMP
- NOW()
- PROCTIME()
consider consider the local timezone and the return type of PROCTIME()/NOW()/CURRENT_TIMESTAMP is TIMESTAMP WITH LOCAL TIME ZONE_LTZ, the return type of CURRENT_DATE is DATE, the return type of CURRENT_TIME is TIME.
(2) The cast between NUMERIC and TIMESTAMP is forbidden
Users can set the option to 'true', at this moment:
(1) These functions would keep the legacy behavior.
(2) The cast between NUMERIC and TIMESTAMP is supportedThe legacy behavior of these function doesn't consider the local time zone, and the return type of PROCTIME()/NOW()/CURRENT_TIMESTAMP is TIMESTAMP, the return type of CURRENT_DATE is DATE, the return type of CURRENT_TIME is TIME.
4. Support defining row time attribute on TIMESTAMP
...
_LTZ
After this, we can support ROWTIME/PROCTIME on type TIMESTAMP WITH LOCAL TIME ZONE, which complements the unifcation.
...
proposed changes | note |
Support function TO_TIMESTAMP_LTZ(numeric_expr, [,scale]) return type: TIMESTAMP(p3) WITH LOCAL TIME ZONE #session timezone: UTC TO_TIMESTAMP_LTZ(44) 1970-01-01 00:00:44 #session timezone: UTCGMT+808:00 TO_TIMESTAMP_LTZ(-28756) 1970-01-01 00:00:44 | TO_TIMESTAMP_LTZ(numeric_expr [,scale]) TO_TIMESTAMP_LTZ(seconds, 0) TO_TIMESTAMP_LTZ(milliseconds, 3) TO_TIMESTAMP_LTZ(microSeconds, 6) TO_TIMESTAMP_LTZ(nanoSeconds, 9) |
...
5. Support more conversion classes for LocalZonedTimestampType(TIMESTAMP_LTZ)
After we use type TIMESTAMP WITH LOCAL TIME ZONE in above functions, we can support all conversion classes like java.time.LocalDateTime, javathe conversion class java.sql.Timestamp that TimestampType supported for LocalZonedTimestampType to for LocalZonedTimestampType, and we enable the implicit cast conversion between TIMESTAMP and TIMESTAMP_LTZ to resolve the UDF compatibility issue.
For example, if user used a UDF which parameter types contains LocalDateTime, and the SQL column data type is TIMESTAMP comes from PROCTIME()/CURRENT_TIMESTAMP, they can migrate to TIMESTAMP WITH LOCAL TIME ZONE _LTZ smoothly.
6. Support abbreviation name for all timestamp types
...
- TIMESTAMP / TIMESTAMP WITHOUT TIME ZONE <=> TIMESTAMP
- TIMESTAMP WITH LOCAL TIME ZONE <=> TIMESTAMP_LTZ
- TIMESTAMP WITH TIME ZONE <=> TIMESTAMP_TZ (supports them in the future until we support TIMESTAMP WITH TIME ZONE )
7. Support TIME(9) for Flink SQL
Due to the historical reason, we didn't not support TIME(9) yet, we think It's a good time point to support it in this FLIP.
8.
...
Use execution mode to hint time function evaluation, introduce CURRENT_ROW_TIMESTAMP function.
For time functionsan option table.exec.time-function-evaluation to control the materialize time point of time function value. The time function includes
- LOCALTIME
- LOCALTIMESTAMP
- CURRENT_DATE
- CURRENT_TIME
- CURRENT_TIMESTAMP
- NOW()
The default value of table.exec.time-function-evaluation includes 'auto', 'per-record', 'query-start'.
...
Flink evaluates above time function values according to execution mode, i.e. Flink evaluates time function value for
...
row level in Streaming mode, evaluates the time function value at query start for batch mode.
...
For batch users who want to obtain row level timestamp of now, we introduce function CURRENT_ROW_TIMESTAMP, the function can also be used in streaming mode.
It's similar to CURRENT_TIMESTAMP function, CURRENT_ROW_TIMESTAMP always returns current timestamp in session time zone, the return type should be TIMESTAMP WITH LOCAL TIME ZONE.
The difference is function CURRENT_ROW_TIMESTAMP always evaluates in row-level, but function CURRENT_TIMESTAMP evaluates according to the execution mode.
General Implementations
General Implementations
1.Change the codegen implementations for above five functions/cast conversions according to the value of introduced table option: table.exec.fallback-legacy-time-function
2. Supports all conversion classes like java.time.LocalDateTime, java.sql.Timestamp that TimestampType supported for LocalZonedTimestampType to resolve the UDF compatibility issue
...
After the proposal is finished, the above user cases will work smoothly. Assume users' local time zone is UTCGMT+808:00, the wall-clock is 2020-12-29 07:52:52.
...
This is an incompatible change, we introduce SQL/API option table.exec.fallback-legacy-time-function for compacting current wrong behavior, and set it to ‘false’. If users want to keep the legacy behavior, they need to set this option value to ’true’ manually, this would be add to release note.
Test Plan
...
that users must change their SQL if they used these time functions, user can also write UDF to obtain same of legacy time function return value.
Test Plan
Will add plan tests, unit tests, window operator harness tests as well as IT tests.
Future Plan
As a conservative plan, we don't introduce the config 'table.exec.time-function-evaluation' right now, but we have discussed that we should introduce it in the future once we receive enough user requirements / feedbacks.
The option is used to control the materialize time point of time function value. The time function includes:
- LOCALTIME
- LOCALTIMESTAMP
- CURRENT_DATE
- CURRENT_TIME
- CURRENT_TIMESTAMP
- NOW()
The valid value of table.exec.time-function-evaluation includes 'auto', 'row', 'query-start'.
- 'auto', the default value which means Flink evaluates above time function values according to execution mode, i.e. Flink evaluates time function value for per record in Streaming mode, evaluates the time function value at query start for batch mode.
- 'row', Flink evaluates time function value for per record.
- 'query-start', Flink evaluate above time function value at query-start, the time function always returns same value in a query.
Rejected Alternatives
1. Change the return type of function CURRENT_TIMESTAMP/NOW()/PROCETIME() to TIMESTAMP WITH TIME ZONE. This proposal needs to introduce a new type TIMESTAMP WITH TIME ZONE, and we think there are no enough benefits. If we do this, the return type of function CURRENT_TIME must be TIME WITH TIME ZONE for consistent consideration, we need to introduce another type.
2. Change the return value of function CURRENT_TIMESTAMP/NOW()/PROCETIME() to the timestamp value in local timezone. The proposal is fine if we only use it in FLINK SQL world, but we need to consider the conversion between Table/DataStream, assume a record produced in UTC+0 timezone with TIMESTAMP '1970-01-01 08:00:44' and the Flink SQL processes the data with session time zone 'UTCGMT+808:00', if the sql program need to convert the Table to DataStream, then we need to calculate the timestamp in StreamRecord with session time zone (UTCGMT+808:00), then we will get 44 in DataStream program, but it is wrong because the expected value should be (8 * 60 * 60 + 44). The corner case tell us that the ROWTIME/PROCTIME in Flink are based on UTC+0, when correct the PROCTIME() function, the better way is to use TIMESTAMP WITH LOCAL TIME ZONE which keeps same long value with time based on UTC+0 and can be expressed with local timezone.
...