Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Example 1 (event time temporal join):

Java:
leftTable.join(rightTable.asOf($("order_time")), $("currency").isEqual("currency"))

Python:
left_table.join(right_table.as_of(left_table.order_time), left.currency == right.currency)

It’s equivalent to the following SQL:

...


Example 2 (processing time temporal join):

Java:
leftTable.join(rightTable.asOf($("proctime")), $("currency").isEqual("currency"))

...


Python:
left_table.join(right_table.as_of(left_table.proctime), left.currency == right.currency)

It’s equivalent to the following SQL:

SELECT 

     order_id,

     price,

SELECT 

     order_id,

     price,

     currency,

     conversion_rate,

     order_time,

FROM orders

LEFT JOIN currency_rates FOR SYSTEM TIME AS OF orders.proctime

ON orders.currency = currency_rates.currency

...

SELECT
    (case when a = 1 then 3
            when a = 2 then 4
      else a end) as a,
    (case when b = 1 then 3
            when b = 2 then 4
      else b end) as b,
    c
FROM T

Sampling

sample

API Specification:

Table sample(double fraction)

Table sample(double fraction, long seed)


Description:

Take a sample of the table according to the given fraction([0.0, 1.0]). 

Example:

table.sample(0.1)

It’s equivalent to the following SQL:

SELECT
  a, b, c
FROM T
WHERE RAND() < 0.1

split

API Specification:

Table[] split(double[] weights)

Table[] split(double[] weights, long seed)

Description:
Splits the table into multiple sub-tables according to the given weights.

Example:

table.split(new double[] { 0.1, 0.2, 0.3 })

It’s logically equivalent to the following SQL:

CREATE VIEW TT AS
SELECT
  a, b, c, RAND(100) as d
FROM T

CREATE VIEW TT1 AS
SELECT
  a, b, c
FROM TT
WHERE d < 0.1/(0.1 + 0.2 + 0.3)

CREATE VIEW TT2 AS
SELECT
  a, b, c
FROM TT
WHERE d >= 0.1/(0.1 + 0.2 + 0.3) and d < 0.2/(0.1 + 0.2 + 0.3)

CREATE VIEW TT3 AS
SELECT
  a, b, c
FROM TT
WHERE d >= 0.2/(0.1 + 0.2 + 0.3)

NOTE: The seed for all the RAND should be the same to make sure that the random value is the same for one element. This is to make sure that one element belongs to only one sub-table(e.g. TT1, TT2, TT3).