Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Authors

Forward Xu, Jark Wu


Status

Current state"Under Discussion"

Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-JSON-functions-in-Flink-SQL-td32674.html

JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

Released: <Flink Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently, Flink SQL does not support the SQL 2016 JSON function. and many of the message middleware data are in JSON format. For example: Kafka, Flink SQL can support JSON operation to more easily parse the data in Kafka. In addition, the data in some relational databases is also stored and ingested in JSON format.

...

The ARRAY WRAPPER clause defines how to represent a JSON array result in JSON_QUERY function. The following examples compare the wrapper behaviors.

Use in TableApi:

jsonExists

JAVA/PYTHON/SCALA

DESCRIPTION

STRING.jsonExists(path)

Whether a jsonValue satisfies a search criterion described using JSON path expression path. 

STRING.jsonExists(path

.Table API currently only support path string parameter, doesn't support ON ERROR clause.

,JsonErrorStrategy.ERROR)

JsonErrorStrategy.ERROR means ERROR ON ERROR.

STRING.jsonExists(path,JsonErrorStrategy.UNKNOWN)

JsonErrorStrategy.UNKNOWN means UNKNOWN ON ERROR.


jsonValue

JAVA/PYTHON/SCALA

DESCRIPTION

STRING.jsonValue(path)

Extract an SQL scalar from a jsonValue using JSON path expression

path.Table API currently only support

path

string parameter, doesn't support ON ERROR clause

.

STRING.

jsonQuery

jsonValue(path,JsonReturningStrategy.INTEGER)

Extract a JSON object or JSON array from jsonValue using the path JSON path expression.Table API currently only support path string parameter, doesn't support ON ERROR clause.

JsonReturningStrategy is an enum object that can be integer, boolean, etc.

STRING.jsonValue(path,JsonEmptyStrategy.NULL)

JsonEmptyStrategy.NULL means NULL ON EMPTY.

STRING.jsonValue(path,JsonEmptyStrategy.ERROR)

JsonEmptyStrategy.ERROR means ERROR ON EMPTY.

STRING.jsonValue(path,JsonEmptyStrategy.DEFAULT)

JsonEmptyStrategy.DEFAULT  means DEFAULT ON EMPTY.

STRING.jsonValue(path,JsonErrorStrategy.NULL)

JsonEmptyStrategy.NULL means NULL ON ERROR.

STRING.jsonValue(path,JsonErrorStrategy.ERROR)

JsonEmptyStrategy.ERROR means NULL ON ERROR.

STRING.jsonValue(path,JsonErrorStrategy.DEFAULT)

JsonEmptyStrategy.DEFAULT means NULL ON ERROR.


jsonQuery

JAVA/PYTHON/SCALA

DESCRIPTION

STRING.jsonQuery(path)

Extract a JSON object or JSON array from jsonValue using the path JSON path expression.

STRING.jsonQuery(path,JsonEmptyStrategy.NULL)

JsonEmptyStrategy.NULL means NULL ON EMPTY.

STRING.jsonQuery(path,JsonEmptyStrategy.ERROR)

JsonEmptyStrategy.ERROR means ERROR ON EMPTY.

STRING.jsonQuery(path,JsonEmptyStrategy.EMPTY_ARRAY)

JsonEmptyStrategy.EMPTY_ARRAY means EMPTY ARRAY ON EMPTY.

STRING.jsonQuery(path,JsonEmptyStrategy.EMPTY_OBJECT)

JsonEmptyStrategy.EMPTY_OBJECT means EMPTY OBJECT ON EMPTY.

STRING.jsonQuery(path,JsonErrorStrategy.NULL)

JsonEmptyStrategy.NULL means NULL ON ERROR.

STRING.jsonQuery(path,JsonErrorStrategy.ERROR)

JsonEmptyStrategy.ERROR means NULL ON ERROR.

STRING.jsonQuery(path,JsonErrorStrategy.EMPTY_ARRAY)

JsonErrorStrategy.EMPTY_ARRAY means  EMPTY ARRAY ON ERROR.

STRING.jsonQuery(path,JsonErrorStrategy.EMPTY_OBJECT)

JsonErrorStrategy.EMPTY_OBJECT means  EMPTY OBJECT ON ERROR.

STRING.jsonQuery(path,JsonWrapperStrategy.NONE)

JsonWrapperStrategy.NONE means WITHOUT WRAPPER

STRING.jsonQuery(path,JsonWrapperStrategy.NONE_ARRAY)

JsonWrapperStrategy.NONE_ARRAY means WITHOUT WRAPPER

STRING.jsonQuery(path,JsonWrapperStrategy.DEFAULT)

JsonWrapperStrategy.DEFAULT means WITH WRAPPER


Example Data:

{"a": "[1,2]", "b": [1,2], "c": "hi"}


Comparison:

OPERATOR

$.A

$.B

$.C

JSON_EXISTS

true

true

true

JSON_VALUE

[1, 2]

error

hi

JSON QUERY WITHOUT ARRAY WRAPPER

error

[1, 2]

error

JSON QUERY WITH UNCONDITIONAL ARRAY WRAPPER

[ “[1,2]” ]

[ [1,2] ]

[ “hi” ]

JSON QUERY WITH CONDITIONAL ARRAY WRAPPER

[ “[1,2]” ]

[1,2]

[ “hi” ]

Constructor Functions

Use in SQL:

SQL SYNTAX

DESCRIPTION

JSON_OBJECT( { [ KEY ] name VALUE value [ FORMAT JSON ] | name : value [ FORMAT JSON ] } * [ { NULL | ABSENT } ON NULL ] )

Construct JSON object using a series of key (name) value (value) pairs.

JSON_OBJECTAGG( { [ KEY ] name VALUE value [ FORMAT JSON ] | name : value [ FORMAT JSON ] } [ { NULL | ABSENT } ON NULL ] )

Aggregate function to construct a JSON object using a key (name) value (value) pair.

JSON_ARRAY( { value [ FORMAT JSON ] } * [ { NULL | ABSENT } ON NULL ] )

Construct a JSON array using a series of values (value). 

JSON_ARRAYAGG( value [ FORMAT JSON ] [ ORDER BY orderItem [, orderItem ]* ] [ { NULL | ABSENT } ON NULL ] )

Construct

Aggregate function to construct a JSON array using a value (value).


Use in TableApi:

jsonObject

JAVA/PYTHON/SCALA

DESCRIPTION

jsonObject(ANY1, ANY2, ...)

Construct JSON object using a series of

values (value). 

JSON_ARRAYAGG( value [ FORMAT JSON ] [ ORDER BY orderItem [, orderItem ]* ] [ { NULL | ABSENT } ON NULL ] )

Aggregate function to construct a JSON array using a value (value).

key (name) value (value) pairs.Table API currently only supports JSON string operations.

jsonObject(ANY1, ANY2, …,JsonNullStrategy.NULL)

JsonNullStrategy.NULL means NULL ON NULL

jsonObject(ANY1, ANY2, …,JsonNullStrategy.ABSENT)

JsonNullStrategy.ABSENT means ABSENT ON NULL

jsonObject(ANY1, ANY2, …,JsonFormatStrategy.TRUE)

JsonFormatStrategy.TRUE means FORMAT JSON

jsonObjectAggUse in TableApi:

JAVA/PYTHON/SCALA

DESCRIPTION

jsonObject

jsonObjectAgg(ANY1,

ANY2, ...

ANY2)

Construct

Aggregate function to construct a JSON object using a

series of

key (name) value (value)

pairs

pair.Table API currently only

supports JSON string operations.jsonObjectAgg

supports JSON string operations.

jsonObjectAgg(ANY1, ANY2,JsonNullStrategy.NULL)

JsonNullStrategy.NULL means NULL ON NULL

jsonObjectAgg(ANY1, ANY2,,JsonNullStrategy.ABSENT)

JsonNullStrategy.ABSENT means ABSENT ON NULL

jsonObjectAgg(ANY1, ANY2,,JsonFormatStrategy.TRUE)

JsonFormatStrategy.TRUE means FORMAT JSON


jsonArray

JAVA/PYTHON/SCALA

DESCRIPTION

jsonArray(ANY1, ANY2, ...)

Aggregate function to construct

Construct a JSON

object

array using a

key (name) value

series of values (value)

pair

. Table API currently only supports JSON string operations.

jsonArray(ANY1, ANY2, …,JsonNullStrategy.NULL)

JsonNullStrategy.NULL means NULL ON NULL

jsonArray(ANY1, ANY2, …,JsonNullStrategy.ABSENT)

Construct a JSON array using a series of values (value). Table API currently only supports JSON string operations.

JsonNullStrategy.ABSENT means ABSENT ON NULL

jsonArray(ANY1, ANY2, …,JsonFormatStrategy.TRUE)

JsonFormatStrategy.TRUE means FORMAT JSON


jsonArrayAgg

JAVA/PYTHON/SCALA

DESCRIPTION

jsonArrayAgg(ANY)

Aggregate function to construct a JSON array using a value (value). Table API currently only supports JSON string operations.

jsonArrayAgg(ANY,JsonNullStrategy.NULL)

JsonNullStrategy.NULL means NULL ON NULL

jsonArrayAgg(ANY,JsonNullStrategy.ABSENT)

JsonNullStrategy.ABSENT means ABSENT ON NULL

jsonArrayAgg(ANY,JsonFormatStrategy.TRUE)

JsonFormatStrategy.TRUE means FORMAT JSON


Note
: The flag FORMAT JSON indicates the value is formatted as JSON character string. When FORMAT JSON is used, the value should be de-parse from JSON character string to a SQL structured value.

...

If ORDER BY clause is provided, JSON_ARRAYAGG sorts the input rows into the specified order before performing aggregation. 

Comparison:

OPERATOR

RESULT

JSON_OBJECT('id', 87, 'name', 'carrot')

{"id": 87, "name": "carrot"} 

JSON_ARRAY(1, "abc", NULL, TRUE)

[1, "abc", null, true]


Simple  Simple Data Table T1:

o_id

attribute

value

2

color

red

2

fabric

silk

3

color

green

3

shape

square


SELECT o_id, JSON_OBJECTAGG(attribute, value) FROM t1 GROUP BY o_id;

OPERATOR

o_id

AFTER AGG

JSON_OBJECTAGG(attribute, value)

2

{"color": "red", "fabric": "silk"}

JSON_OBJECTAGG(attribute, value)

3

{"color": "green", "shape": "square"}


SELECT o_id, JSON_ARRAYAGG(attribute) AS attributes FROM t1 GROUP BY o_id;

OPERATOR

o_id

AFTER AGG

JSON_ARRAYAGG(attribute)

2

["color", "fabric"] 

JSON_ARRAYAGG(attribute)

3

["color", "shape"]


Comparison Operators

Applications will frequently want to ensure that the data they expect to consume as JSON data is, indeed, JSON data. The IS JSON predicate determines whether the value of a specified string does or does not conform to the structural rules for JSON. The syntax of the IS JSON predicate is:

Use in SQL:

SQL SYNTAX

DESCRIPTION

jsonValue IS JSON [ VALUE ]

Whether jsonValue is a JSON value.Table API functions currently only support JSON strings as input parameters.

jsonValue IS NOT JSON [ VALUE ]

Whether jsonValue is not a JSON value.Table API functions currently only support JSON strings as input parameters.

jsonValue IS JSON SCALAR

Whether jsonValue is a JSON scalar value.Table API functions currently only support JSON strings as input parameters.

jsonValue IS NOT JSON SCALAR

Whether jsonValue is not a JSON scalar value.Table API functions currently only support JSON strings as input parameters.

jsonValue IS JSON OBJECT

Whether jsonValue is a JSON object.Table API functions currently only support JSON strings as input parameters.

jsonValue IS NOT JSON OBJECT

Whether jsonValue is not a JSON object.Table API functions currently only support JSON strings as input parameters.

jsonValue IS JSON ARRAY

Whether jsonValue is a JSON array.Table API functions currently only support JSON strings as input parameters.

jsonValue IS NOT JSON ARRAY

Whether jsonValue is not a JSON array.Table API functions currently only support JSON strings as input parameters.


Use in TableApi:

JAVA/PYTHON/SCALA

DESCRIPTION

ANY.isJsonValue()

Whether jsonValue is a JSON value.Table API functions currently only support JSON strings as input parameters.

ANY.isNotJsonValue()

Whether jsonValue is not a JSON value.Table API functions currently only support JSON strings as input parameters.

ANY.isJsonScalar()

Whether jsonValue is a JSON scalar value.Table API functions currently only support JSON strings as input parameters. 

ANY.isNotJsonScalar()

Whether jsonValue is not a JSON scalar value.Table API functions currently only support JSON strings as input parameters.

ANY.isJsonObject()

Whether jsonValue is a JSON object.Table API functions currently only support JSON strings as input parameters.

ANY.isNotJsonObject()

Whether jsonValue is not a JSON object.Table API functions currently only support JSON strings as input parameters.

ANY.isJsonArray()

Whether jsonValue is a JSON array.Table API functions currently only support JSON strings as input parameters.

ANY.isNotJsonArray()

Whether jsonValue is not a JSON array.Table API functions currently only support JSON strings as input parameters.

The test example:

tester.checkBoolean("'{}' is json value", true);

tester.checkBoolean("'{]' is json value", false);

tester.checkBoolean("'{}' is json object", true);

tester.checkBoolean("'[]' is json object", false);

tester.checkBoolean("'{}' is json array", false);

tester.checkBoolean("'[]' is json array", true);

tester.checkBoolean("'100' is json scalar", true);

tester.checkBoolean("'[]' is json scalar", false);

tester.checkBoolean("'{}' is not json value", false);

tester.checkBoolean("'{]' is not json value", true);

tester.checkBoolean("'{}' is not json object", false);

tester.checkBoolean("'[]' is not json object", true);

tester.checkBoolean("'{}' is not json array", true);

tester.checkBoolean("'[]' is not json array", false);

tester.checkBoolean("'100' is not json scalar", false);

tester.checkBoolean("'[]' is not json scalar", true);

Implementation Details

Table Api interface proposal

...