...
Example 1 (event time temporal join):
Java: leftTable.join(rightTable.asOf($("order_time")), $("currency").isEqual("currency"))
Python: left_table.join(right_table.as_of(left_table.order_time), left.currency == right.currency)
|
It’s equivalent to the following SQL:
...
Example 2 (processing time temporal join):
Java: leftTable.join(rightTable.asOf($("proctime")), $("currency").isEqual("currency"))
|
...
Python: left_table.join(right_table.as_of(left_table.proctime), left.currency == right.currency)
|
It’s equivalent to the following SQL:
SELECT order_id, price, SELECT order_id, price, currency, conversion_rate, order_time, FROM orders LEFT JOIN currency_rates FOR SYSTEM TIME AS OF orders.proctime ON orders.currency = currency_rates.currency |
...
SELECT (case when a = 1 then 3 when a = 2 then 4 else a end) as a, (case when b = 1 then 3 when b = 2 then 4 else b end) as b, c FROM T |
Sampling
sample
API Specification:
Table sample(double fraction)
Table sample(double fraction, long seed) |
Description:
Take a sample of the table according to the given fraction([0.0, 1.0]).
Example:
It’s equivalent to the following SQL:
SELECT a, b, c FROM T WHERE RAND() < 0.1 |
split
API Specification:
Table[] split(double[] weights)
Table[] split(double[] weights, long seed) |
Description:
Splits the table into multiple sub-tables according to the given weights.
Example:
table.split(new double[] { 0.1, 0.2, 0.3 }) |
It’s logically equivalent to the following SQL:
CREATE VIEW TT AS SELECT a, b, c, RAND(100) as d FROM T
CREATE VIEW TT1 AS SELECT a, b, c FROM TT WHERE d < 0.1/(0.1 + 0.2 + 0.3)
CREATE VIEW TT2 AS SELECT a, b, c FROM TT WHERE d >= 0.1/(0.1 + 0.2 + 0.3) and d < 0.2/(0.1 + 0.2 + 0.3)
CREATE VIEW TT3 AS SELECT a, b, c FROM TT WHERE d >= 0.2/(0.1 + 0.2 + 0.3)
NOTE: The seed for all the RAND should be the same to make sure that the random value is the same for one element. This is to make sure that one element belongs to only one sub-table(e.g. TT1, TT2, TT3). |