You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Hive fundamentally knows two different types of tables:

  • Managed (Internal)
  • External

This document lists some of the differences between the two but the fundamental difference is that Hive assumes that it owns the data for managed tables. That means that the data, its properties and data layout will and can only be changed via Hive command. The data still lives in a normal file system and nothing is stopping you from changing it without telling Hive about it. If you do though it violates invariants and expectations of Hive and you might see undefined behavior.

Another consequence is that data is attached to the Hive entities. So whenever you change an entity (e.g. drop a table) the data is also changed (in this case the data is deleted).

This is very much like with traditional RDBMS where you would also not manage the data files on your own but use a SQL-based access to do so.


For external tables Hive assumes that it does not manage the data.


This means that there are lots of features which are only available for one of the two table types but not the other. This is an incomplete list of things:

  • ARCHIVE/UNARCHIVE/TRUNCATE/MERGE/CONCATENATE only work for managed tables
  • ACID/Transactional only works for managed tables
  • Query Results Caching only works for managed tables
  • Only the RELY constraint is allowed on external tables
  • Some Materialized View features only work on managed tables
  • No labels