Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

ACID and Transactions in Hive

Table of Contents

Warning
titleHive 3 Warning

Any transactional tables created by a Hive version prior to Hive 3 require Major Compaction to be run on every partition before upgrading to 3.0.  More precisely, any partition which has had any update/delete/merge statements executed on it since the last Major Compaction, has to undergo another Major Compaction.  No more update/delete/merge may happen on this partition until after Hive is upgraded to Hive 3.

What is ACID and why should you use it?

ACID stands for four traits of database transactions:  Atomicity (an operation either succeeds completely or fails, it does not leave partial data), Consistency (once an application performs an operation the results of that operation are visible to it in every subsequent operation), Isolation (an incomplete operation by one user does not cause unexpected side effects for other users), and Durability (once an operation is complete it will be preserved even in the face of machine or system failure).  These traits have long been expected of database systems as part of their transaction functionality.  

...

  1. Streaming ingest of data.  Many users have tools such as Apache Flume, Apache Storm, or Apache Kafka that they use to stream data into their Hadoop cluster.  While these tools can write data at rates of hundreds or more rows per second, Hive can only add partitions every fifteen minutes to an hour.  Adding partitions more often leads quickly to an overwhelming number of partitions in the table.  These tools could stream data into existing partitions, but this would cause readers to get dirty reads (that is, they would see data written after they had started their queries) and leave many small files in their directories that would put pressure on the NameNode.  With this new functionality this use case will be supported while allowing readers to get a consistent view of the data and avoiding too many files.
  2. Slow changing dimensions.  In a typical star schema data warehouse, dimensions tables change slowly over time.  For example, a retailer will open new stores, which need to be added to the stores table, or an existing store may change its square footage or some other tracked characteristic.  These changes lead to inserts of individual records or updates of records (depending on the strategy chosen).  Starting with 0.14, Hive is able to support this.
  3. Data restatement.  Sometimes collected data is found to be incorrect and needs correction.  Or the first instance of the data may be an approximation (90% of servers reporting) with the full data provided later.  Or business rules may require that certain transactions be restated due to subsequent transactions (e.g., after making a purchase a customer may purchase a membership and thus be entitled to discount prices, including on the previous purchase).  Or a user may be contractually required to remove their customer’s data upon termination of their relationship.  Starting with Hive 0.14 these use cases can be supported via INSERT, UPDATE, and DELETE.
  4. Bulk updates using SQL MERGE statement.

Limitations

  • BEGIN, COMMIT, and ROLLBACK are not yet supported.  All language operations are auto-commit.  The plan is to support these in a future release.
  • Only ORC file format is supported in this first release.  The feature has been built such that transactions can be used by any storage format that can determine how updates or deletes apply to base records (basically, that has an explicit or implicit row id), but so far the integration work has only been done for ORC.
  • By default transactions are configured to be off.  See the Hive Transactions (Copy) Configuration section below for a discussion of which values need to be set to configure it.
  • Tables must be bucketed to make use of these features.  Tables in the same system not using transactions and ACID do not need to be bucketed. External tables cannot be made ACID tables since the changes on external tables are beyond the control of the compactor (HIVE-13175).
  • Reading/writing to an ACID table from a non-ACID session is not allowed. In other words, the Hive transaction manager must be set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager in order to work with ACID tables.
  • At this time only snapshot level isolation is supported.  When a given query starts it will be provided with a consistent snapshot of the data.  There is no support for dirty read, read committed, repeatable read, or serializable.  With the introduction of BEGIN the intention is to support snapshot isolation for the duration of transaction rather than just a single query.  Other isolation levels may be added depending on user requests.
  • The existing ZooKeeper and in-memory lock managers are not compatible with transactions.  There is no intention to address this issue.  See Hive Transactions (Copy) Basic Design below for a discussion of how locks are stored for transactions.
  • Schema changes using ALTER TABLE is NOT supported for ACID tables. HIVE-11421 is tracking it.  Fixed in 1.3.0/2.0.0.
  • Using Oracle as the Metastore DB and "datanucleus.connectionPoolingType=BONECP" may generate intermittent "No such lock.." and "No such transaction..." errors.  Setting "datanucleus.connectionPoolingType=DBCP" is recommended in this case. 
  • LOAD DATA... statement is not supported with transactional tables.  (This was not properly enforced until HIVE-16732)

Streaming APIs

Hive offers APIs for streaming data ingest and streaming mutation:

...

A comparison of these two APIs is available in the Background section of the Streaming Mutation document.

Grammar Changes

INSERT...VALUES, UPDATE, and DELETE have been added to the SQL grammar, starting in Hive 0.14.  See LanguageManual DML for details.

...

A new option has been added to ALTER TABLE to request a compaction of a table or partition.  In general users do not need to request compactions, as the system will detect the need for them and initiate the compaction.  However, if compaction is turned off for a table or a user wants to compact the table at a time the system would not choose to, ALTER TABLE can be used to initiate the compaction.  See Alter Table/Partition Compact for details.  This will enqueue a request for compaction and return.  To watch the progress of the compaction the user can use SHOW COMPACTIONS.

A new command ABORT TRANSACTIONS has been added, see Abort Transactions for details.

Basic Design

HDFS does not support in-place changes to files.  It also does not offer read consistency in the face of writers appending to files being read by a user.  In order to provide these features on top of HDFS we have followed the standard approach used in other data warehousing tools.  Data for the table or partition is stored in a set of base files.  New records, updates, and deletes are stored in delta files.  A new set of delta files is created for each transaction (or in the case of streaming agents such as Flume or Storm, each batch of transactions) that alters a table or partition.  At read time the reader merges the base and delta files, applying any updates and deletes as it reads.

Base and Delta Directories

Previously all files for a partition (or a table if the table is not partitioned) lived in a single directory.  With these changes, any partitions (or tables) written with an ACID aware writer will have a directory for the base files and a directory for each set of delta files.  Here is what this may look like for an unpartitioned table "t":

Code Block
titleFilesystem Layout for Table "t"
hive> dfs -ls -R /user/hive/warehouse/t;
drwxr-xr-x   - ekoifman staff          0 2016-06-09 17:03 /user/hive/warehouse/t/base_0000022
-rw-r--r--   1 ekoifman staff        602 2016-06-09 17:03 /user/hive/warehouse/t/base_0000022/bucket_00000
drwxr-xr-x   - ekoifman staff          0 2016-06-09 17:06 /user/hive/warehouse/t/delta_0000023_0000023_0000
-rw-r--r--   1 ekoifman staff        611 2016-06-09 17:06 /user/hive/warehouse/t/delta_0000023_0000023_0000/bucket_00000
drwxr-xr-x   - ekoifman staff          0 2016-06-09 17:07 /user/hive/warehouse/t/delta_0000024_0000024_0000
-rw-r--r--   1 ekoifman staff        610 2016-06-09 17:07 /user/hive/warehouse/t/delta_0000024_0000024_0000/bucket_00000

Compactor

Compactor is a set of background processes running inside the Metastore to support ACID system.  It consists of Initiator, Worker, Cleaner, AcidHouseKeeperService and a few others.

Delta File Compaction

As operations modify the table more and more delta files are created and need to be compacted to maintain adequate performance.  There are three types of compactions, minor, major and rebalance.

...

All compactions are done in the background. Minor and major compactions do not prevent concurrent reads and writes of the data. Rebalance compaction uses exclusive write lock, therefore it prevents concurrent writes. After a compaction the system waits until all readers of the old files have finished and then removes the old files.

Initiator

This module is responsible for discovering which tables or partitions are due for compaction.  This should be enabled in a Metastore using hive.compactor.initiator.on.  There are several properties of the form *.threshold in "New Configuration Parameters for Transactions" table below that control when a compaction task is created and which type of compaction is performed.  Each compaction task handles 1 partition (or whole table if the table is unpartitioned).  If the number of consecutive compaction failures for a given partition exceeds hive.compactor.initiator.failed.compacts.threshold, automatic compaction scheduling will stop for this partition.  See Configuration Parameters table for more info.

Worker

Each Worker handles a single compaction task.  A compaction is a MapReduce job with name in the following form: <hostname>-compactor-<db>.<table>.<partition>.  Each worker submits the job to the cluster (via hive.compactor.job.queue if defined) and waits for the job to finish.  hive.compactor.worker.threads determines the number of Workers in each Metastore.  The total number of Workers in the Hive Warehouse determines the maximum number of concurrent compactions.

Cleaner

This process is a process that deletes delta files after compaction and after it determines that they are no longer needed.

AcidHouseKeeperService

This process looks for transactions that have not heartbeated in hive.txn.timeout time and aborts them.  The system assumes that a client that initiated a transaction stopped heartbeating crashed and the resources it locked should be released.

SHOW COMPACTIONS

This commands displays information about currently running compaction and recent history (configurable retention period) of compactions.  This history display is available since HIVE-12353.

Also see LanguageManual DDL#ShowCompactions for more information on the output of this command and Hive Transactions (Copy)NewConfigurationParametersforTransactions/Compaction History for configuration properties affecting the output of this command.  The system retains the last N entries of each type: failed, succeeded, attempted (where N is configurable for each type).


Transaction/Lock Manager

A new logical entity called "transaction manager"  was added which incorporated previous notion of "database/table/partition lock manager" (hive.lock.manager with default of org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager). The transaction manager is now additionally responsible for managing of transactions locks. The default DummyTxnManager emulates behavior of old Hive versions: has no transactions and uses hive.lock.manager property to create lock manager for tables, partitions and databases. A newly added DbTxnManager manages all locks/transactions in Hive metastore with DbLockManager (transactions and locks are durable in the face of server failure). This means that previous behavior of locking in ZooKeeper is not present anymore when transactions are enabled. To avoid clients dying and leaving transaction or locks dangling, a heartbeat is sent from lock holders and transaction initiators to the metastore on a regular basis.  If a heartbeat is not received in the configured amount of time, the lock or transaction will be aborted.

...

Note that the lock manager used by DbTxnManager will acquire locks on all tables, even those without "transactional=true" property.  By default, Insert operation into a non-transactional table will acquire an exclusive lock and thus block other inserts and reads.  While technically correct, this is a departure from how Hive traditionally worked (i.e. w/o a lock manger).  For backwards compatibility, hive.txn.strict.locking.mode (see table below) is provided which will make this lock manager acquire shared locks on insert operations on non-transactional tables.  This restores previous semantics while still providing the benefit of a lock manager such as preventing table drop while it is being read.  Note that for transactional tables, insert always acquires share locks since these tables implement MVCC architecture at the storage layer and are able to provide strong read consistency (Snapshot Isolation) even in presence of concurrent modification operations.

Configuration

Minimally, these configuration parameters must be set appropriately to turn on transaction support in Hive:

...

The following sections list all of the configuration parameters that affect Hive transactions and compaction.  Also see Hive Transactions (Copy) Limitations above and Hive Transactions (Copy) Table Properties below.

New Configuration Parameters for Transactions

A number of new configuration parameters have been added to the system to support transactions.

...

5If the value is not the same active transactions may be determined to be "timed out" and consequently Aborted.  This will result in errors like "No such transaction...", "No such lock ..."

Configuration Values to Set for INSERT, UPDATE, DELETE

In addition to the new parameters listed above, some existing parameters need to be set to support INSERT ... VALUES, UPDATE, and DELETE.

Configuration keyMust be set to
hive.support.concurrencytrue (default is false)
hive.enforce.bucketingtrue (default is false) (Not required as of Hive 2.0)
hive.exec.dynamic.partition.modenonstrict (default is strict)

Configuration Values to Set for Compaction

If the data in your system is not owned by the Hive user (i.e., the user that the Hive metastore runs as), then Hive will need permission to run as the user who owns the data in order to perform compactions.  If you have already set up HiveServer2 to impersonate users, then the only additional work to do is assure that Hive has the right to impersonate users from the host running the Hive metastore.  This is done by adding the hostname to hadoop.proxyuser.hive.hosts in Hadoop's core-site.xml file.  If you have not already done this, then you will need to configure Hive to act as a proxy user.  This requires you to set up keytabs for the user running the Hive metastore and add hadoop.proxyuser.hive.hosts and hadoop.proxyuser.hive.groups to Hadoop's core-site.xml file.  See the Hadoop documentation on secure mode for your version of Hadoop (e.g., for Hadoop 2.5.1 it is at Hadoop in Secure Mode).

Compaction pooling

More in formation on compaction pooling can be found here: Compaction pooling

Table Properties

If a table is to be used in ACID writes (insert, update, delete) then the table property "transactional=true" must be set on that table, starting with Hive 0.14.0. Note, once a table has been defined as an ACID table via TBLPROPERTIES ("transactional"="true"), it cannot be converted back to a non-ACID table, i.e., changing TBLPROPERTIES ("transactional"="false") is not allowed. Also, hive.txn.manager must be set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager either in hive-site.xml or in the beginning of the session before any query is run. Without those, inserts will be done in the old style; updates and deletes will be prohibited prior to HIVE-11716.  Since HIVE-11716 operations on ACID tables without DbTxnManager are not allowed.  However, this does not apply to Hive 0.13.0.

...

Code Block
titleExample: Set compaction options in TBLPROPERTIES at request level
ALTER TABLE table_name COMPACT 'minor' 
   WITH OVERWRITE TBLPROPERTIES ("compactor.mapreduce.map.memory.mb"="3072");  -- specify compaction map job properties
ALTER TABLE table_name COMPACT 'major'
   WITH OVERWRITE TBLPROPERTIES ("tblprops.orc.compress.size"="8192");         -- change any other Hive table properties

Talks and Presentations

Transactional Operations In Hive by Eugene Koifman at Dataworks Summit 2017, San Jose, CA, USA

...