Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Eventually, hive will provide some utilities to copy a table/partition from the primary cluster to the secondary clusters.
    In the first cut, the user needs to do this operation outside hive (one simple way to do so, is distcp the partition from the
    primary to the secondary cluster, and then update the metadata directly - via the thrift api).
  • This will require a change to the metastore schema. StorageDescriptor will be enhanced to add:
    PrimaryCluster - ClusterStorageDescriptor
    and SecondaryClusters - Set<ClusterStorageDescriptor>

The ClusterStorageDescriptor contains the following:
ClusterName
Location

location will be removed from the StorageDescriptor.

  • This will require a scheme change and data migration.
  • The thrift structure will be backward compatible
    • New entries will be added: ClusterName, IsPrimary etc., but existing clients using sd.location will continue to work

In order to support the above, hive metastore needs to be enhanced to have the concept of a cluster.
The existing thrift API's will continue to work as if the user is trying to access the default cluster.
New APIs will be added which take the cluster as a new parameter. Almost all the existing APIs will be
enhanced to support this. The behavior will be the same as if, the user issued the command 'USE CLUSTER <CLUSTERNAME>

...