...
As operations modify the table more and more delta files are created and need to be compacted to maintain adequate performance. There are two three types of compactions, minor, major and majorrebalance.
- Minor compaction takes a set of existing delta files and rewrites them to a single delta file per bucket.
- Major compaction takes one or more delta files and the base file for the bucket and rewrites them into a new base file per bucket. Major compaction is more expensive but is more effective.
- More information about rebalance compaction can be found here: Rebalance compaction
All compactions are done in the background. Minor and major compactions do not prevent concurrent reads and writes of the data. . Rebalance compaction uses exclusive write lock, therefore it prevents concurrent writes. After a compaction the system waits until all readers of the old files have finished and then removes the old files.
...
If the data in your system is not owned by the Hive user (i.e., the user that the Hive metastore runs as), then Hive will need permission to run as the user who owns the data in order to perform compactions. If you have already set up HiveServer2 to impersonate users, then the only additional work to do is assure that Hive has the right to impersonate users from the host running the Hive metastore. This is done by adding the hostname to hadoop.proxyuser.hive.hosts
in Hadoop's core-site.xml
file. If you have not already done this, then you will need to configure Hive to act as a proxy user. This requires you to set up keytabs for the user running the Hive metastore and add hadoop.proxyuser.hive.hosts
and hadoop.proxyuser.hive.groups
to Hadoop's core-site.xml
file. See the Hadoop documentation on secure mode for your version of Hadoop (e.g., for Hadoop 2.5.1 it is at Hadoop in Secure Mode).
...
Compaction pooling
More in formation on compaction pooling can be found here: Compaction pooling
...
Table Properties
If a table is to be used in ACID writes (insert, update, delete) then the table property "transactional=true
" must be set on that table, starting with Hive 0.14.0. Note, once a table has been defined as an ACID table via TBLPROPERTIES ("transactional"="true"), it cannot be converted back to a non-ACID table, i.e., changing TBLPROPERTIES ("transactional"="false") is not allowed. Also, hive.txn.manager must be set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager either in hive-site.xml or in the beginning of the session before any query is run. Without those, inserts will be done in the old style; updates and deletes will be prohibited prior to HIVE-11716. Since HIVE-11716 operations on ACID tables without DbTxnManager are not allowed. However, this does not apply to Hive 0.13.0.
...