Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • There will be a single hive instance. possibly spanning multiple clusters (both dfs and mr)
  • There will be a single hive metastore to keep track of the table/partition locations across different clusters.
  • There will be a default cluster for the session. Commands will be added to change the cluster.
    • Use cluster <ClusterName>
  • A table/partition can exist in more than one cluster. However, a single table will have a primary cluster, and can have multiple
    secondary clusters.
  • Table/Partition's metadata will be enhanced to support multiple clusters/locations of the table.
    • All the data for a table is available in the primary cluster.
    • The user can only update the table (or its partition) in the primary cluster.
    • Eventually, hive will provide some utilities to copy a table/partition from the primary cluster to the secondary clusters.
      In the first cut, the user needs to do this operation outside hive (one simple way to do so, is distcp the partition from the
      primary to the secondary cluster, and then update the metadata directly - via the thrift api).

...