Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The existing thrift API's will continue to work as if the user is trying to access the default cluster.
New APIs will be added which take the cluster as a new parameter. Almost all the existing APIs will be
enhanced to support this. The behavior will be the same as if, the user issued the command 'USE CLUSTER <CLUSTERNAME>

  • A new parameter will be added to keep the filesystem and jobtrackers for a cluster
    • hive.cluster.properties: This will be json - ClusterName -> <FileSystem, JobTracker>
    • use cluster <cluster name> will fail if <cluster name> is not present hive.cluster.properties
    • The other option was to support create cluster <> etc. but that would have required storing the cluster information in the
      metastore including jobtracker etc. which would be difficult to change per session.

If the user does not intend of use multiple clusters, there should be no change in the behavior of hive commands, and all the
existing thrift APIs should continue to work. There may be some very minor changes required: Cluster -> JobTracker mapping (with only
a single entry) and all table's primary cluster need to some pre-defined fixed cluster. If that is a problem, we can even add a new
configuration parameter, hive.disable.clusters, which will make this whole multi-cluster business transparent to the end users.