Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Operator

Operand types

Description

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="04732f53e42a923f-0a493176-41074453-a0078b78-a2f965997d072a8808ee565f"><ac:plain-text-body><![CDATA[

A[n]

A is an Array and n is an int

returns the nth element in the array A. The first element has index 0 e.g. if A is an array comprising of ['foo', 'bar'] then A[0] returns 'foo' and A[1] returns 'bar'

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="30030bbfbaaed17f-4136040d-48d249a0-93919387-592bd4d241c2d1082dcc714d"><ac:plain-text-body><![CDATA[

M[key]

M is a Map<K, V> and key has type K

returns the value corresponding to the key in the map e.g. if M is a map comprising of {'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar'

]]></ac:plain-text-body></ac:structured-macro>

S.x

S is a struct

returns the x field of S e.g for struct foobar {int foo, int bar} foobar.foo returns the integer stored in the foo field of the struct.

...

Return Type

Aggregation Function Name (Signature)

Description

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="abd4b95cf8c40a81-6a7e80eb-46374f0d-9159ba04-5890cf026d20920281e54e90"><ac:plain-text-body><![CDATA[

BIGINT

count(*), count(expr), count(DISTINCT expr[, expr_.])

count(*) - Returns the total number of retrieved rows, including rows containing NULL values; count(expr) - Returns the number of rows for which the supplied expression is non-NULL; count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL.

]]></ac:plain-text-body></ac:structured-macro>

DOUBLE

sum(col), sum(DISTINCT col)

returns the sum of the elements in the group or the sum of the distinct values of the column in the group

DOUBLE

avg(col), avg(DISTINCT col)

returns the average of the elements in the group or the average of the distinct values of the column in the group

DOUBLE

min(col)

returns the minimum value of the column in the group

DOUBLE

max(col)

returns the maximum value of the column in the group

...

Dynamic-partition insert (or multi-partition insert) is designed to solve this problem by dynamically determining which partitions should be created and populated while scanning the input table. This is a newly added feature that is only available from version 0.6.0 (trunk now). In the dynamic partition insert, the input column values are evaluated to determine which partition this row should be inserted into. If that partition has not been created, it will create that partition automatically. Using this feature you need only one insert statement to create and populate all necessary partitions. In addition, since there is only one insert statement, there is only one corresponding MapReduce job. This significantly improves performance and reduce the Hadoop cluster workload comparing to the multiple insert case.

...