Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

table.exec.fallback-legacy-time-function = false false/true 

table.exec.use-utc-for-unixtime-conversion = false    

The default value is false which means the return values of function

  • CURRENT_DATE
  • CURRENT_TIME
  • CURRENT_TIMESTAMP
  • NOW()
  • PROCTIME()

consider the local timezone. Users can set the option to 'true', thus these functions would keep the legacy behavior.

And when the option also influence the result of CAST conversion between NUMERIC TYPE and TIMESTAMP, by default,the CAST conversion is not supported anymore,  users can use TO_TIMESTAMP(milliseconds)  function to get a TIMESTAMP in local time zone from a java epoch milliseconds.  Users can set the option to 'true', thus these cast conversion would keep the legacy behavior./true    

Proposed Changes

1. Correct

...

the return value of time functions

I invested all Flink time-related functions current behavior and compared with other DB vendors like Pg,Presto, Hive, Spark, Snowflake,  I made an excel [2] to organize them well.

...

All of those way are valid because SQL:2011 does not specify the function return type and every SQL engine vendor has its own implementation[2], for example CURRENT_TIMESTAMP.

functionflink current behavior

flink proposed changes

other SQL vendors' behavior

CURRENT_TIMESTAMP #session timezone: UTC
2020-12-28T23:52:52

#session timezone: UTC+8
2020-12-28T23:52:52

wall clock:
UTC+8:2020-12-29 07:52:52
#session timezone: UTC
2020-12-28T23:52:52

#session timezone: UTC+8
2020-12-29T07:52:52

In MySQL, Spark, the function NOW() and CURRENT_TIMESTAMP return current timestamp value in session time zone,the return type is TIMESTAMP

In
Pg, Presto, the function NOW() and CURRENT_TIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH TIME ZONE

In
Snowflake the function CURRENT_TIMESTAMP/LOCALTIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH LOCAL TIME ZONE


This FLIP proposed proposes option ( 2 ) which only change the return valuethe proposed changes as following, the current wall-clock is 2020-12-29 07:52:52 in Beijing time(UTC+8):

function

existed problem

current behavior

proposed changes

CURRENT_DATE

returns UTC date, but user expects current date in session time zone

return type: DATE

#session timezone: UTC

2020-12-28

#session timezone: UTC+8

2020-12-28

 return current date in session time zone, the return type should be DATE

#session timezone: UTC

2020-12-28

#session timezone: UTC+8

2020-12-29

CURRENT_TIME

returns UTC time, but user expects current time in session time zone

return type:  TIME 

#session timezone: UTC

23:52:52

#session timezone: UTC+8

23:52:52

return current time in session time zone, the return type should be TIME

#session timezone: UTC

23:52:52

#session timezone: UTC+8

07:52:52

CURRENT_TIMESTAMP

returns UTC timestamp, but user expects current timestamp in session time zone

return type:  TIMESTAMP 

#session timezone: UTC

2020-12-28T23:52:52

#session timezone: UTC+8

2020-12-28T23:52:52

return current timestamp in session time zone, the return type should be TIMESTAMP

#session timezone: UTC

2020-12-28T23:52:52

#session timezone: UTC+8

2020-12-29T07:52:52

NOW()

returns UTC timestamp, but user expects current timestamp in session time zone

return type:  TIMESTAMP 

#session timezone: UTC

2020-12-28T23:52:52

#session timezone: UTC+8

2020-12-28T23:52:52

return current timestamp in session time zone, the return type should be TIMESTAMP

#session timezone: UTC

2020-12-28T23:52:52

#session timezone: UTC+8

2020-12-29T07:52:52

PROCTIME()

returns UTC timestamp, but user expects current timestamp in session time zone

return type:  TIMESTAMP *PROCTIME*

#session timezone: UTC

2020-12-28T23:52:52

#session timezone: UTC+8

2020-12-28T23:52:52

return current timestamp in session time zone for PROCTIME(), the return type should be TIMESTAMP  *PROCTIME*

#session timezone: UTC

2020-12-28T23:52:52

#session timezone: UTC+8

2020-12-29T07:52:52

After the proposal, the function NOW(), CURRENT_TIMESTAMP and LOCALTIMESTAMP become  synonyms, the function CURRENT_TIME and LOCALTIME become synonyms, you can also lookup all time function behaviors in reference [2]. 

2.

...

Disable CAST

...

between 

...

NUMERIC  and TIMESTAMP

...

Currently, the following CAST conversion behaviors are wrong which does not consider the session time zone. It should use session time zone when cast between NUMERIC type and TIMESTAMP type if there’re strong requirements to support this , the numeric type include TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, they keep same conversion behavior, for example the cast between BIGINT and TIMESTAMP. Although we didn’t expose this feature in the document, some users may use them.

These cast conversions have wrong behavior and problematic semantics,  because SQL:2011 does not contains these cast specification and we never expose to user.

This FLIP propose to disable the cast between numeric and timestamp a better way to replace these functionalities is the and introduce a the function TO_TIMESTAMP(seconds) function which accepts a BIGINT seconds and return a TIMESTAMP in local time zone

function

current behavior

existed problem

proposed changes

CAST(44 AS TIMESTAMP) 


TIMESTAMP(6) NOT NULL

#session timezone: UTC

1970-01-01T00:00:44 

#session timezone: UTC+8

1970-01-01T00:00:44 

The time in BIGINT type usually represents a unixtime semantic, which represents the elapsed time since java epoch(1970-01-01 00:00:00 UTC+0), when convert to a timestamp we should consider local time zone

Support function  TO_TIMESTAMP(seconds)

return type: TIMESTAMP(3)

#session timezone: UTC

TO_TIMESTAMP(44)

1970-01-01T00:00:44 

#session timezone: UTC+8

TO_TIMESTAMP(-28756)

1970-01-01T00:00:44

CAST(TIMESTAMP ‘1970-01-01 00:00:44’ AS BIGINT) 

BIGINT NOT NULL

#session timezone: UTC

44 

#session timezone: UTC+8

44

The inverse conversion of above, this conversion is used rarely.

UNIX_TIMESTAMP(TIMESTAMP ‘1970-01-01 00:00:44’)

#session timezone: UTC

44

#session timezone: UTC+8

-28756


3. Consider the timezone offset in conversion between Table and DataStream

The row time field is stored  in a separate field with Long type of StreamRecord in DataStream world, and the value with the timestamp is the milliseconds since java epoch(1970-01-01 00:00:00 UTC+0), which definitely represents an Instant semantics. 

When converting a Table contains row time from/to DataStream, the timestamp in StreamRecord will exchange with Table’s row time column.

We need to consider the time zone offset just like the cast conversion between the TIMESTAMP with wall-clock semantics and BIGINT with Instant semantics.  This way can make the both DataStream and SQL/API users feel intuitive.


4. Introduce an option that enable fallback to legacy behavior

 The default value of table.exec.fallback-legacy-time-function is 'false' which means

(1) the return values of function

  • CURRENT_DATE
  • CURRENT_TIME
  • CURRENT_TIMESTAMP
  • NOW()
  • PROCTIME()

consider the local timezone.

(2) The cast between NUMERIC and TIMESTAMP is forbidden

(3) The session timezone offset will be considered when convert a Table from/to DataStream.


Users can set the option to 'true', at this moment:

(1) These functions would keep the legacy behavior.

(2) The cast between NUMERIC and TIMESTAMP is supported.

(3)The session timezone offset will not be considered when convert a Table from/to DataStream.



General Implementations

1.Change the codegen implementations for above five functions/cast conversions according to the value of introduced table option: table.exec.fallback-legacy-time-function.enable 

2.Consider the session time zone offset when materializing a PROCTIME() attribute, including PROCTIME_MATERIALIZE function and the processing timestamp used to register timer in the window operator, because after change the return value of PROCTIME()function,

the window based on processing time should be triggered by the changed processing time value.

3. When converting a Table contains rowtime from/to DataStream, the timestamp in StreamRecord will exchange with Table’s row time column. We need to consider the time zone offset just like the cast conversion between the TIMESTAMP and BIGINT.  This way can make the

both DataStream and SQL/API users feel intuitive.

Compatibility, Deprecation, and Migration Plan

...