Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Status

Current state[One of "Under Discussion", "Accepted", "Rejected"]

...

Google Doc: <If the design in question is unclear or needs to be discussed and reviewed, a Google Doc can be used first to facilitate comments from others.>

Motivation

Describe the problems you are trying to solve.

Related Research

some research related to the function, such as the advantages and disadvantages of the design, related considerations, etc.

Detailed Design

the detailed design of the function.

Scheduling

Support lambda expression function in Doris is motivated by the following:

       Lambda function is very convenient. Users can write the logic they want to achieve by freely. Now Doris starts to support some complex data types,eg: array, map, struct...

       But we can't do some operate with array type, like [1,2,3] + [2,3,4], or abs([1,-10,-9]).....

       because the functions of basic types are not reused, need to implement various functions again for complex types, It will take a lot of time and energy to implement it all over.

Related Research

       More and more databases have implemented this function,like in ClickHouse[1] it's named higher order function, and in Prestodb[2] it's named lambda function.

       Although there are some differences in the name and the way of use, But finally wanted to do is the same.

       Now the users use complex types, It has a big relationship with the number of functions we have implemented on those types. So it's time to implement lambda functions.

Detailed Design

what is lambda function

Code Block
languagesql
select array_map(x -> abs(x), [1,-2,-3]);
+-----------------------------------------------+
| array_map([x] -> abs(x(0)), ARRAY(1, -2, -3)) |
+-----------------------------------------------+
| [1, 2, 3]                                     |
+-----------------------------------------------++

select array_map((x,y) -> (x - y), [10,20,30], [10,25,50]);
+--------------------------------------------------------------------------+
| array_map([x, y] -> (x(0) - y(1)), ARRAY(10, 20, 30), ARRAY(10, 25, 50)) |
+--------------------------------------------------------------------------+
| [0, -5, -20]                                                             |
+--------------------------------------------------------------------------+
    
    
select array_filter([1,2,3,4],[1,0,1,0]);
+----------------------------------------------------+
| array_filter(ARRAY(1, 2, 3, 4), ARRAY(1, 0, 1, 0)) |
+----------------------------------------------------+
| [1, 3]                                             |
+----------------------------------------------------+

select array_map((x,y) -> (x >= y), [10,20,30], [10,25,50]);
+---------------------------------------------------------------------------+
| array_map([x, y] -> (x(0) >= y(1)), ARRAY(10, 20, 30), ARRAY(10, 25, 50)) |
+---------------------------------------------------------------------------+
| [1, 0, 0]                                                                 |
+---------------------------------------------------------------------------+    

select array_filter(x -> (x > 0), array_map((x,y) -> (x >= y), [10,20,30], [10,25,50])) as res;
+------+
| res  |
+------+
| [1]  |
+------+


Image Added

How to implement the lambda function

  1. Need a new type for FE, when register and find functions.

  2. Need a new Expr when paser the user input lambda expression. we can separate into the lambda argument and real expr,

    Then could bind the X ,Y... to the array1,array2....., eg: the nested type of array, and id.

  3. When the function arrive to BE, first need to execute the children except children(0), because the lambda argument X,Y

    is bind to the params, need get the nested data column.

  4. Last execute the children(0) to calculate lambda expr, when get the result column, then need to warp an array column to return.

Scheduling

Now, I have a plan to implement this feature:

  1. Firstly, need to implent the array_map and array_filter lambda function, We can add them together to realize many interesting functions.

  2. Secondly, Implement various functions on a large scope, eg: arrayFirst, arrayLast, arrayFill.....

  3. Thirdly, can consider implementing lambda function other complex types, eg: map


[1]  https://clickhouse.com/docs/zh/sql-reference/functions/higher-order-functions/ 
[2]  https://prestodb.io/blog/2020/03/02/presto-lambda specific implementation steps and approximate scheduling.