THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- When the warehouse reaches datacenter capacity limits, it is hard to identify self-contained pieces that can be migrated out.
- Capacity tracking and management becomes an issue.
Requirements
Introduce the notion of a virtual warehouse (namespace) in Hive with the below key properties: - Can be housed in the same physical warehouse with other virtual warehouses (multi-tenancy).
- Portable (so it can be moved from one physical warehouse to another). Being self-contained is a necessary condition for portability (all queries on this namespace operate only on data available in the namespace).
- Unit of capacity tracking and capacity allocation. This is a nice side effect of creating self-contained namespaces and allows capacity planning based on the virtual warehouse growth.
...
- Provide metadata to identify tables and queries that belong to one namespace.
- Provide controls to prevent operating on tables outside the namespace.
- Provide commands to explicitly request that tables/partitions in namespace1 be made available in namespace2 (since some tables/partitions may be needed across multiple namespaces). Avoid making copies of tables/partitions for this.
Design
The design that is proposed is: - Modeling namespaces as databases. No explicit accounting/tracking of tables/partitions/views that belong to a namespace is needed since a database provides that already.
- Prevent access using two part name syntax (Y.T) if namespaces feature is “on” in a Hive instance. This ensures the database is self-contained.
- Modeling table/partition imports across namespaces using a new concept called Links in Hive. There will be commands to create Links to tables in other databases, alter and drop them. Links do not make copies of the table/partition and hence avoid data duplication in the same physical warehouse.
...
Links to JIRAS for these features:
- https://issues.apache.org/jira/browse/HIVE-3016 HIVE-3016 Allow disabling foreign table access (cross database) using hiveconf
- https://issues.apache.org/jira/browse/HIVE-2989 HIVE-2989 Adding Table Links to Hive
A basic tenet of our design is that a Hive instance does not operate across physical warehouses. We are building a namespace service external to Hive that has metadata on namespace location across the Hive instances, and allows importing data across Hive instances using replication.
...