Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This

...

is

...

a

...

followup

...

to

...

Hive-ViewDev

...

for

...

adding

...

partition-awareness

...

to

...

views.

...

Use

...

Cases

...

  1. An

...

  1. administrator

...

  1. wants

...

  1. to

...

  1. create

...

  1. a

...

  1. set

...

  1. of

...

  1. views

...

  1. as

...

  1. a

...

  1. table/column

...

  1. renaming

...

  1. layer

...

  1. on

...

  1. top

...

  1. of

...

  1. an

...

  1. existing

...

  1. set

...

  1. of

...

  1. base

...

  1. tables,

...

  1. without

...

  1. breaking

...

  1. any

...

  1. existing

...

  1. dependencies

...

  1. on

...

  1. those

...

  1. tables.

...

  1. To

...

  1. read-only

...

  1. users,

...

  1. the

...

  1. views

...

  1. should

...

  1. behave

...

  1. exactly

...

  1. the

...

  1. same

...

  1. as

...

  1. the

...

  1. underlying

...

  1. tables

...

  1. in

...

  1. every

...

  1. way.

...

  1. Among

...

  1. other

...

  1. things,

...

  1. this

...

  1. means

...

  1. users

...

  1. should

...

  1. be

...

  1. able

...

  1. to

...

  1. browse

...

  1. available

...

  1. partitions.

...

  1. A

...

  1. base

...

  1. table

...

  1. is

...

  1. partitioned

...

  1. on

...

  1. columns

...

  1. (ds,hr)

...

  1. for

...

  1. date

...

  1. and

...

  1. hour.

...

  1. Besides

...

  1. this

...

  1. fine-grained

...

  1. partitioning,

...

  1. users

...

  1. would

...

  1. also

...

  1. like

...

  1. to

...

  1. see

...

  1. a

...

  1. virtual

...

  1. table

...

  1. of

...

  1. coarse-grained

...

  1. (date-only)

...

  1. partitioning

...

  1. in

...

  1. which

...

  1. the

...

  1. partition

...

  1. for

...

  1. a

...

  1. given

...

  1. date

...

  1. only

...

  1. appears

...

  1. after

...

  1. all

...

  1. of

...

  1. the

...

  1. hour-level

...

  1. partitions

...

  1. of

...

  1. that

...

  1. day

...

  1. have

...

  1. been

...

  1. fully

...

  1. loaded.

...

  1. A

...

  1. view

...

  1. is

...

  1. defined

...

  1. on

...

  1. a

...

  1. complex

...

  1. join+union+aggregation

...

  1. of

...

  1. a

...

  1. number

...

  1. of

...

  1. underlying

...

  1. base

...

  1. tables

...

  1. and

...

  1. other

...

  1. views,

...

  1. all

...

  1. of

...

  1. which

...

  1. are

...

  1. themselves

...

  1. partitioned.

...

  1. The

...

  1. top-level

...

  1. view

...

  1. should

...

  1. also

...

  1. be

...

  1. partitioned

...

  1. accordingly,

...

  1. with

...

  1. a

...

  1. new

...

  1. partition

...

  1. not

...

  1. appearing

...

  1. until

...

  1. corresponding

...

  1. partitions

...

  1. have

...

  1. been

...

  1. loaded

...

  1. for

...

  1. all

...

  1. of

...

  1. the

...

  1. underlying

...

  1. tables.

    Approaches

  1. One possible approach mentioned in HIVE-1079 is to infer view partitions automatically based on the partitions of the underlying tables. A command such as SHOW PARTITIONS could then synthesize virtual partition descriptors on the fly. This is fairly easy to do for use case #1, but potentially very difficult for use cases #2 and #3. So for now, we are punting on this approach.
  2. Instead, per HIVE-1941, we will require users to explicitly declare view partitioning as part of CREATE VIEW, and explicitly manage partition metadata via ALTER VIEW ADD|DROP PARTITION. This allows all of the use cases to be satisfied (while placing more burden on the user, and taking up more metastore space). With this approach, there is no real connection between view partitions and underlying table partitions; it's even possible to create a partitioned view on an unpartitioned table, or to have data in the view which is not covered by any view partition. One downside here is that a UI will not be able to show last access time and physical information such as file size when browsing available partitions. (In theory, stats could work via an explicit ANALYZE, but analyzing a view would need some work.)

    Syntax

Code Block

h1. Approaches

 # One possible approach mentioned in [HIVE-1079|https://issues.apache.org/jira/browse/HIVE-1079] is to infer view partitions automatically based on the partitions of the underlying tables.  A command such as SHOW PARTITIONS could then synthesize virtual partition descriptors on the fly.  This is fairly easy to do for use case #1, but potentially very difficult for use cases #2 and #3.  So for now, we are punting on this approach.
 # Instead, per [HIVE-1941|https://issues.apache.org/jira/browse/HIVE-1941], we will require users to explicitly declare view partitioning as part of CREATE VIEW, and explicitly manage partition metadata via ALTER VIEW {ADD|DROP} PARTITION.  This allows all of the use cases to be satisfied (while placing more burden on the user, and taking up more metastore space).  With this approach, there is no real connection between view partitions and underlying table partitions; it's even possible to create a partitioned view on an unpartitioned table, or to have data in the view which is not covered by any view partition.  One downside here is that a UI will not be able to show last access time and physical information such as file size when browsing available partitions.  (In theory, stats could work via an explicit ANALYZE, but analyzing a view would need some work.)
h1. Syntax

{code}
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], ...) ]
[COMMENT table_comment]
[PARTITIONED ON (col1, col2, ...)]
[TBLPROPERTIES ...]
AS SELECT ...

ALTER VIEW view_name ADD [IF NOT EXISTS] partition_spec partition_spec ...

ALTER VIEW view_name DROP [IF EXISTS] partition_spec, partition_spec, ...

partition_spec:
  : PARTITION (partition_col = partition_col_value, partition_col = partiton_col_value, ...)
{code}

Notes:

...

  • Whereas

...

  • CREATE

...

  • TABLE

...

  • uses

...

  • PARTITIONED

...

  • BY,

...

  • CREATE

...

  • VIEW

...

  • uses

...

  • PARTITIONED

...

  • ON.

...

  • This

...

  • difference

...

  • is

...

  • intentional

...

  • because

...

  • in

...

  • CREATE

...

  • TABLE,

...

  • the

...

  • PARTITIONED

...

  • BY

...

  • clause

...

  • specifies

...

  • additional

...

  • column

...

  • definitions

...

  • which

...

  • are

...

  • appended

...

  • to

...

  • the

...

  • non-partitioning

...

  • columns.

...

  • With

...

  • CREATE

...

  • VIEW,

...

  • the

...

  • PARTITIONED

...

  • ON

...

  • clause

...

  • references

...

  • (by

...

  • name)

...

  • columns

...

  • already

...

  • produced

...

  • by

...

  • the

...

  • view

...

  • definition.

...

  • Only

...

  • column

...

  • names

...

  • appear

...

  • in

...

  • PARTITIONED

...

  • ON;

...

  • no

...

  • types

...

  • etc.

...

  • However,

...

  • to

...

  • match

...

  • the

...

  • CREATE

...

  • TABLE

...

  • convention

...

  • of

...

  • trailing

...

  • partitioning

...

  • columns,

...

  • the

...

  • columns

...

  • referenced

...

  • by

...

  • the

...

  • PARTITIONED

...

  • ON

...

  • clause

...

  • must

...

  • be

...

  • the

...

  • last

...

  • columns

...

  • in

...

  • the

...

  • view

...

  • definition,

...

  • and

...

  • their

...

  • order

...

  • in

...

  • the

...

  • PARTITIONED

...

  • ON

...

  • clause

...

  • must

...

  • match

...

  • their

...

  • order

...

  • in

...

  • the

...

  • view

...

  • definition.

...

  • The

...

  • ALTER

...

  • VIEW

...

  • ADD/DROP

...

  • partition

...

  • syntax

...

  • is

...

  • identical

...

  • to

...

  • ALTER

...

  • TABLE,

...

  • except

...

  • that

...

  • it

...

  • is

...

  • illegal

...

  • to

...

  • specify

...

  • a

...

  • LOCATION

...

  • clause.

...

  • Other

...

  • ALTER

...

  • TABLE

...

  • commands

...

  • which

...

  • operate

...

  • on

...

  • partitions

...

  • (e.g.

...

  • TOUCH/ARCHIVE)

...

  • are

...

  • not

...

  • supported.

...

  • (But

...

  • maybe

...

  • we

...

  • need

...

  • to

...

  • support

...

  • TOUCH?)

...

  • Metastore

When storing view partition descriptors in the metastore, Hive omits the storage descriptor entirely. This is because there is no data associated with the view partition, so there is no need to keep track of partition-level column descriptors for table schema evolution, nor a partition location.

Strict Mode

Hive strict mode (enabled with hive.mapred.mode=strict)

...

prevents

...

execution

...

of

...

queries

...

lacking

...

a

...

partition

...

predicate.

...

This

...

only

...

applies

...

to

...

base

...

table

...

partitions.

...

What

...

does

...

this

...

mean?

...

Suppose

...

you

...

have

...

table

...

T1

...

partitioned

...

on

...

C1,

...

and

...

view

...

V1

...

which

...

selects

...

FROM

...

T1

...

WHERE

...

C1=5.

...

Then

...

a

...

query

...

such

...

as

...

SELECT

...

*

...

FROM

...

V1

...

will

...

succeed

...

even

...

in

...

strict

...

mode,

...

since

...

the

...

predicate

...

inside

...

of

...

the

...

view

...

constrains

...

C1.

...

Likewise,

...

suppose

...

you

...

have

...

view

...

V2

...

which

...

selects

...

from

...

T1

...

(with

...

no

...

WHERE

...

clause)

...

and

...

is

...

partitioned

...

on

...

C2.

...

Then

...

a

...

query

...

such

...

as

...

SELECT

...

*

...

FROM

...

V2

...

WHERE

...

C2=3

...

will

...

fail;

...

even

...

though

...

the

...

view

...

partition

...

column

...

is

...

constrained,

...

there

...

is

...

no

...

predicate

...

on

...

the

...

underlying

...

T1's

...

partition

...

column

...

C1.

...

View

...

Definition

...

Changes

...

Currently,

...

changing

...

a

...

view

...

definition

...

requires

...

dropping

...

the

...

view

...

and

...

recreating

...

it.

...

This

...

implies

...

dropping

...

and

...

recreating

...

all

...

existing

...

partitions

...

as

...

well,

...

which

...

could

...

be

...

very

...

expensive.

...

This

...

implies

...

that

...

followup

...

support

...

for

...

CREATE

...

OR

...

REPLACE

...

VIEW

...

is

...

very

...

important,

...

and

...

that

...

it

...

needs

...

to

...

preserve

...

existing

...

partitions

...

(after

...

validating

...

that

...

they

...

are

...

still

...

compatible

...

with

...

the

...

new

...

view

...

definition).

...

Hook

...

Information

...

Although

...

there

...

is

...

currently

...

no

...

connection

...

between

...

the

...

view

...

partition

...

and

...

underlying

...

table

...

partitions,

...

Hive

...

does

...

provide

...

dependency

...

information

...

as

...

part

...

of

...

the

...

hook

...

invocation

...

for

...

ALTER

...

VIEW

...

ADD

...

PARTITION.

...

It

...

does

...

this

...

by

...

compiling

...

an

...

internal

...

query

...

of

...

the

...

form

{
Code Block
}
SELECT * FROM view_name
WHERE view_partition_col1 = 'val1' AND view_partition_col=2 = 'val2' ...
{code}

and

...

then

...

capturing

...

the

...

table/partition

...

inputs

...

for

...

this

...

query

...

and

...

passing

...

them

...

on

...

to

...

the

...

ALTER

...

VIEW

...

ADD

...

PARTITION

...

hook

...

results.

...

This

...

allows

...

applications

...

to

...

track

...

the

...

dependencies

...

themselves.

...

In

...

the

...

future,

...

Hive

...

will

...

automatically

...

populate

...

these

...

dependencies

...

into

...

the

...

metastore

...

as

...

part

...

of

...

HIVE-1073

...

.

...