Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

alter table Tdependent add partition (ds='1') location '/T/ds=1' dependent partitions table T partitions (ds='1');
specify the partial partition spec for the dependent partitions.
Note that each table can point to different locations - hive needs to ensure that all the dependent partitions are under the location 'T/ds=1'

  • Specify the location

The metastore can store the dependencies completely or partially.

    • Materialize the dependencies both-ways
      Tdependent@ds=1 depends on T@ds=1/hr=1 to T@ds=1/hr=24
      T@ds=1/hr=1 is depended upon by T@ds=1
      Advantages: if T@ds=1/hr=1 is dropped, T@ds=1 can be notified or it can choose to dis-allow this
      Any property on Tdependent can be propagated to T
    • Is the dependency used for querying ? What happens if T@ds=1/hr=25 gets added ? The query 'select .. from Tdependent where ds = 1' includes T@ds=1/hr=25, but this is not shown in the inputs.
    • Dont use the location for querying - then why have the location ?
    • Store partial dependencies
      Tdependent@ds=1 depends on T@ds=1 (spec).
      At describe time, the spec is evaluated and all the dependent partitions are computed dynamically. At add partition time, verify that the location captures all dependent partitions.
      The partial spec is not used for querying - location is used for that. At query time, verify that the location captures all dependent partitions.
  • The dependent table does not have a location.
    • The list of partitions are computed at query time - think of it like a view, where each partition has its own definition limited to 'select * from T where partial/full partition spec'. Query layer needs to change. Is it possible ? Unlike a view, it does not rewritten at semantic analysis time. After partition pruning is done (on a dependent table), rewrite the
      tree to contain the base table T - the columns remain the same, so it should be possible.
  • The list of dependent partitions are materialized and stored in the metastore, and use that for querying.
    A query like 'select .. from Tdependent where ds = 1' gets transformed to 'select .. from (select * from T where ((ds = 1 and hr = 1) or (ds = 1 and hr = 2) .... or (ds=1 and hr=24))'
    Can put a lot of load on the query layer.