Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Inside facebook, we are running out of power inside a data center (physical cluster), and we have a need to have a bigger cluster.
We can divide the cluster into multiple clusters - multiple hive instances, multiple mr and multiple dfs. This will put a burden on
the user - he needs to know which cluster to use. It will be very difficult to support joins across tables in different clusters, and
will lead to a lot of duplication of data in the long run. To get around these problems, we are planning to extend hive to span
multiple data centers, and make the existence of multiple clusters transparent for the end users in the long term. Note that, even
today, different partitions/tables can span multiple dfs's, and hive does not enforce any restrictions. Those dfs's can be in different
data centers also. However,

We are planning to make hive run across multiple data centers (physical clusters). We prefer to use hive metastore to provide a
unified namespace.

...