Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Table of Contents

Lateral View Syntax

Code Block


lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*
fromClause: FROM baseTable (lateralView)*

...

Lateral view is used in conjunction with user-defined table generating functions such as explode(). As mentioned in LanguageManual UDF#UDTF Built-in Table-Generating Functions, a UDTF generates one zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

Important note

...

Info
titleVersion

Prior to Hive 0.6.0, lateral view

...

did not support the predicate push-down optimization

...

. In Hive 0.5.0 and earlier, if you used a WHERE clause

...

your query may not

...

have compiled. A workaround was to add set hive.optimize.ppd=false; before your query. The fix was made in Hive 0.6.0; see https://issues.apache.org/jira/browse/HIVE-1056

...

: Predicate push down does not work with UDTF's.

Info
titleVersion

From Hive 0.12.0, column aliases can be omitted. In this case, aliases are inherited from field names of StructObjectInspector which is returned from UTDF.

...

Example

Consider the following base table named pageAds. It has two columns: pageid (name of the page) and adid_list (an array of ads appearing on the page):

Column name

Column type

pageid

STRING

string pageid

Array<int> adid_list

Array<int>

An example table with two rows:

pageid

adid_list

front_page

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="bfaadbee-cfa3-45dd-8bcc-ff9c98532ebc"><ac:plain-text-body><![CDATA[

"front_page"

[1, 2, 3] ]]></ac:plain-text-body></ac:structured-macro><ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="8c02669f-9549-4d02-b8dc-e1a33e43cc04"><ac:plain-text-body><![CDATA[

"contact_page"

[3, 4, 5]

]]></ac:plain-text-body></ac:structured-macro>

and the user would like to count the total number of times an ad appears across all pages.

A lateral view with explode() LanguageManual UDF#explode can be used to convert adid_list into separate rows using the query:

Code Block
sql
sql

SELECT pageid, adid 
	FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;

The resulting output will be

pageid adid

pageid (string

)

adid (int

)

"front_page"

1

"front_page"

2

"front_page"

3

"contact_page"

3

"contact_page"

4

"contact_page"

5

Then in order to count the number of times a particular ad appears, count/group by can be used:

Code Block
sql
sql

SELECT adid, count(1) 
	FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid
GROUP BY adid;

...

For example, the following could be a valid query:

Code Block
sql
sql

SELECT * FROM exampleTable
	LATERAL VIEW explode(col1) myTable1 AS myCol1
	LATERAL VIEW explode(myCol1) myTable2 AS myCol2;

...

Array<int> col1

Array<string> col2<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="b502dd2c-95cc-4574-ad62-ec27e987c6ea"><ac:plain-text-body><![CDATA[

[1, 2]

[a", "b", "c"] ]]></ac:plain-text-body></ac:structured-macro><ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="4f58518b-d2a6-4045-be8a-f4b037c9add0"><ac:plain-text-body><![CDATA[

[3, 4]

[d", "e", "f"]

]]></ac:plain-text-body></ac:structured-macro>

The query:

Code Block
sql
sql

SELECT myCol1, col2 FROM baseTable
	LATERAL VIEW explode(col1) myTable1 AS myCol1;

...

int mycol1

Array<string> col2<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="a4ac6165-dee4-4ba1-878a-f5a37aa85ef8"><ac:plain-text-body><![CDATA[

1

[a", "b", "c"] ]]></ac:plain-text-body></ac:structured-macro><ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="6fb06c51-bc61-4503-8af3-eb5696067d1e"><ac:plain-text-body><![CDATA[

2

[a", "b", "c"] ]]></ac:plain-text-body></ac:structured-macro><ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="fc3e506a-9a2f-43b5-b2a0-81b78962ddfa"><ac:plain-text-body><![CDATA[

3

[d", "e", "f"]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="1784ca26-6990-4550-9215-7f43ff8c222e"><ac:plain-text-body><![CDATA[

4

[d", "e", "f"]

]]></ac:plain-text-body></ac:structured-macro>

A query that adds an additional LATERAL VIEW:

Code Block
sql
sql

SELECT myCol1, myCol2 FROM baseTable
	LATERAL VIEW explode(col1) myTable1 AS myCol1
	LATERAL VIEW explode(col2) myTable2 AS myCol2;

...

int myCol1

string myCol2

1

"a"

1

"b"

1

"c"

2

"a"

2

"b"

2

"c"

3

"d"

3

"e"

3

"f"

4

"d"

4

"e"

4

"f"

Outer Lateral Views

Info
titleVersion

Introduced in Hive version 0.12.0

The user can specify the optional OUTER keyword to generate rows even when a LATERAL VIEW usually would not generate a row. This happens when the UDTF used does not generate any rows which happens easily with explode when the column to explode is empty. In this case the source row would never appear in the results. OUTER can be used to prevent that and rows will be generated with NULL values in the columns coming from the UDTF.

For example, the following query returns an empty result:

Code Block
languagesql
SELEC * FROM src LATERAL VIEW explode(array()) C AS a limit 10;

But with the OUTER keyword

Code Block
languagesql
SELECT * FROM src LATERAL VIEW OUTER explode(array()) C AS a limit 10;

it will produce:

238 val_238 NULL
86 val_86 NULL
311 val_311 NULL
27 val_27 NULL
165 val_165 NULL
409 val_409 NULL
255 val_255 NULL
278 val_278 NULL
98 val_98 NULL
...