Status

Current state[One of "Under Discussion", "Accepted", "Rejected"]

Discussion thread: 

JIRA or Github Issue: 

Released: <Doris Version>

Google Doc: <If the design in question is unclear or needs to be discussed and reviewed, a Google Doc can be used first to facilitate comments from others.>

Motivation

Support lambda expression function in Doris is motivated by the following:

       Lambda function is very convenient. Users can write the logic they want to achieve by freely. Now Doris starts to support some complex data types,eg: array, map, struct...

       But we can't do some operate with array type, like [1,2,3] + [2,3,4], or abs([1,-10,-9]).....

       because the functions of basic types are not reused, need to implement various functions again for complex types, It will take a lot of time and energy to implement it all over.

Related Research

       More and more databases have implemented this function,like in ClickHouse[1] it's named higher order function, and in Prestodb[2] it's named lambda function.

       Although there are some differences in the name and the way of use, But finally wanted to do is the same.

       Now the users use complex types, It has a big relationship with the number of functions we have implemented on those types. So it's time to implement lambda functions.

Detailed Design

what is lambda function

select array_map(x -> abs(x), [1,-2,-3]);
+-----------------------------------------------+
| array_map([x] -> abs(x(0)), ARRAY(1, -2, -3)) |
+-----------------------------------------------+
| [1, 2, 3]                                     |
+-----------------------------------------------++

select array_map((x,y) -> (x - y), [10,20,30], [10,25,50]);
+--------------------------------------------------------------------------+
| array_map([x, y] -> (x(0) - y(1)), ARRAY(10, 20, 30), ARRAY(10, 25, 50)) |
+--------------------------------------------------------------------------+
| [0, -5, -20]                                                             |
+--------------------------------------------------------------------------+
    
    
select array_filter([1,2,3,4],[1,0,1,0]);
+----------------------------------------------------+
| array_filter(ARRAY(1, 2, 3, 4), ARRAY(1, 0, 1, 0)) |
+----------------------------------------------------+
| [1, 3]                                             |
+----------------------------------------------------+

select array_map((x,y) -> (x >= y), [10,20,30], [10,25,50]);
+---------------------------------------------------------------------------+
| array_map([x, y] -> (x(0) >= y(1)), ARRAY(10, 20, 30), ARRAY(10, 25, 50)) |
+---------------------------------------------------------------------------+
| [1, 0, 0]                                                                 |
+---------------------------------------------------------------------------+    

select array_filter(x -> (x > 0), array_map((x,y) -> (x >= y), [10,20,30], [10,25,50])) as res;
+------+
| res  |
+------+
| [1]  |
+------+

How to implement the lambda function

  1. Need a new type for FE, when register and find functions.

  2. Need a new Expr when paser the user input lambda expression. we can separate into the lambda argument and real expr,

    Then could bind the X ,Y... to the array1,array2....., eg: the nested type of array, and id.

  3. When the function arrive to BE, first need to execute the children except children(0), because the lambda argument X,Y

    is bind to the params, need get the nested data column.

  4. Last execute the children(0) to calculate lambda expr, when get the result column, then need to warp an array column to return.

Scheduling

Now, I have a plan to implement this feature:

  1. Firstly, need to implent the array_map and array_filter lambda function, We can add them together to realize many interesting functions.

  2. Secondly, Implement various functions on a large scope, eg: arrayFirst, arrayLast, arrayFill.....

  3. Thirdly, can consider implementing lambda function other complex types, eg: map


[1]  https://clickhouse.com/docs/zh/sql-reference/functions/higher-order-functions/ 
[2]  https://prestodb.io/blog/2020/03/02/presto-lambda 


  • No labels