Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

== Dependent Tables ==
Create a table which explicitly depends on another table. Consider the following scenario for the first use case mentioned: <code>create table T (key string, value string) partitioned by (ds string, hr string);
</code>

Code Block

create table T (key string, value string) partitioned by (ds string, hr string);

-- create a dependent table which specifies the dependencies explicitly
-- Tdep inherits the schema of T
-- Tdep is partitioned by a prefix of T (either ds or ds,hr)
create dependent table Tdep partitioned by (ds string) depends on table T;

In order to denote the end of a daily partition for T, the corresponding partition is added in Tdep

Code Block

-- create partition T@ds=1/hr=1 to T@ds=1/hr=24
alter table Tdep add partition (ds=1);

The metastore stores the following dependencies

  • Tdep depends on T, and T is dependent upon by Tdep
  • Tdep@ds=1 depends on T@ds=1

In some usecases, it is possible that some partitions of Tdep depends on T, whereas the newer partitions of Tdep depend on T2.

Code Block

create table T (key string, value string) partitioned by (ds string, hr string);

-- create a dependent table which specifies the dependencies explicitly
-- Tdep inherits the schema of T
-- Tdep is partitioned by a prefix of T (either ds or ds,hr)
create dependent table Tdep partitioned by (ds string) depends on table T;

-- create partition T@ds=1/hr=1 to T@ds=1/hr=24
alter table Tdep add partition (ds=1);

-- repeat this for T@ds=1 to T@ds=10

-- T is being deprecated, and T2 is the new table (with the same schema, possibly partitioned on different keys)

create table T2 (key string, value string) partitioned by (ds string, hr string, min string);

alter table Tdep depends on table T2;

-- create partition T2@ds=11/hr=1/min=1 to T2@ds=11/hr=24/min=60
alter table Tdep add partition (ds=1);

The metastore stores the following dependencies

  • Tdep depends on (T,T2), and (T,T2) are dependent upon by Tdep
  • T and T2 have the same schema
  • Tdep@ds=1..Tdep@ds=10 depends on T@ds=1..T@ds=10 respectively
  • Tdep@ds=11 depends on T2@ds=11

The query on Tdep is re-written to access the underlying tables. For eg. for the query 'select count(1) from Tdep where ds = 1', once partition pruning is done (on Tdep), the operator tree (TableScan(Tdep):ds=1 -> Select) is rewritten to (TableScan(T):ds=1/hr=1 to ds=1/hr=24 -> Select). The advantage of this over external tables is that all the table properties (bucketing/sorting/list bucketing) for the underlying tables are used. This can easily extend to multiple tables. For eg. for the query 'select count(1) from Tdep where ds = 1 or ds = 11', once partition pruning is done (on Tdep), the operator tree (TableScan(Tdep):ds=1,ds=11 -> Select) is rewritten to (TableScan(T):ds=1/hr=1 to ds=1/hr=24 -> Select -> Union) and (TableScan(T2):ds=1/hr=1 to ds=1/hr=24 -> Select -> Union).

  • If the schema of T changes, it can either be dis-allowed or propagated to Tdep.
  • desc extended will be enhanced to show all the dependent partitions.
  • The dependency of an existing partition can be changed: alter table Tdep partition (ds=10) depends on table T2