Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Improvements around supporting arbitrary types

...

Even though this is an API breaking change, we aim for backwards compatibility. The new extraction is designed to support most of the old features and enables new features. Some slight adaptation of existing UDFs might be necessary. The new UDF design will only be supported in the newly introduced unified method defined in FLIP-64:

Deprecated with old type inference

New type inference

registerScalarFunction/

registerAggregateFunction/

registerTableFunction

createTemporaryFunction


It will enable all kinds of functions in the new `org.apache.flink.table.api.TableEnvironment`.

...

The following table list the classes that are extracted by default:

Class

Data Type

String

STRING

Boolean
boolean

BOOLEAN (NOT NULL)

Byte
byte

TINYINT (NOT NULL)

Short
short

SMALLINT (NOT NULL)

Integer
int

INT (NOT NULL)

Long
long

BIGINT (NOT NULL)

Float
float

FLOAT (NOT NULL)

Double
double

DOUBLE (NOT NULL)

java.sql.Date
java.time.LocalDate

DATE

java.sql.Time
java.time.LocalTime

TIME(0)
TIME(9)

java.sql.Timestamp
java.time.LocalDateTime

TIMESTAMP(9)

java.time.OffsetDateTime
java.time.ZonedDateTime

TIMESTAMP(9) WITH TIME ZONE

java.time.Instant

TIMESTAMP(9) WITH LOCAL TIME ZONE

java.time.Duration

INTERVAL SECOND(9)

java.time.Period

INTERVAL YEAR(4) TO MONTH

arrays of the above

ARRAY<E>

Map<K, V>

MAP<K, V>

POJOs and Case classes

STRUCTURED TYPE


The list explicitly excludes the following types:

...

    input = @DataTypeHint(arbitraryInput = YES"ANY"),

    isVarArgs = YES,

    output = @DataTypeHint("STRING"))

...

  • Defining a logical type with default conversion
    e.g. `@DataTypeHint("INT")`
  • Defining a data type with different conversion
    e.g. `@DataTypeHint(value = "TIMESTAMP(3)", bridgedTo = java.sql.Timestamp.class)`
  • Just parameterizing the extraction
    e.g. `@DataTypeHint(version = 1, enableAny  allowAnyGlobally = true)`


Within a FunctionHint, an empty DataTypeHint (no logical type) is only allowed as top-level property default.

...

The following options for parameterizing the extraction are exposed through the annotation, we might add more in the future. The list might seem pretty big at first glance but keep in mind that extraction is not always performed on little/simple POJOs but is sometimes performed on classes with 100+ fields that may have been generated using Avro or Protobuf:

Parameter

Description

version

Logic version for future backwards compatibility. Current version by default.

allowAnyGlobally

General flag that defines whether ANY data type should be used for classes that cannot be mapped to any SQL-like type or cause an error. Set to false by default, which means that an exception is thrown for unmapped types. For example, `java.math.BigDecimal` cannot be mapped because the SQL standard defines that decimals have a fixed precision and scale.

allowAnyPattern

Patterns that enable the usage of an ANY type. A pattern is a prefix or a fully qualified name of `Class#getName()` excluding arrays. The general `allowAnyGlobally` flag must not be enabled for patterns.

forceAnyPattern

Patterns that force the usage of an ANY type. A pattern is a prefix or a fully qualified name of `Class#getName()` excluding arrays. `allowAnyGlobally` must not be enabled for forcing ANY types.

defaultDecimalPrecision

Sets a default precision for all decimals that occur. By default, decimals are not extracted.

defaultDecimalScale

Sets a default scale for all decimals that occur. By default, decimals are not extracted.

defaultYearPrecision

Sets a default year precision for year-month intervals. If set to 0, a month interval is assumed.

defaultSecondPrecision

Sets a default second fraction for timestamps and intervals that occur. E.g. because some planners don't support nano seconds yet.

arbitraryInput

Determines whether arbitrary input should be allowed. If set to true, this has similar behavior as an always passing input type validator. The bridging class must be Object.


Some examples:


public class ScalarFunction {

...

    public void eval(String prefix, @DataTypeHint(arbitraryInput = YES"ANY") Object obj) {

        //...

...

→ Takes a string and an arbitrary input parameter and returns a STRING.


Note to the last exampleThe ANY type is a special logical type as it bridges the Java class hierarchy world and the SQL type world. An ANY type is always connected to a class. In the example above, the ANY type is interpreted as `ANY<java.lang.Object>`. When translating an ANY type to an input validation, the validation happens class-based. Only the ANY type uses class-based validation with the class given in the input data type only. So eval(java.lang.Object) will accept any data type (including primitives) according to the JVM specification.

Manual Definition

If the (possibly annotated) extraction cannot solve a certain use case, for example, because literal values of a function call need to be analyzed or the return type is dependent on the input type. More advanced users can overwrite the `getTypeInference()` method.

...