...
- BEGIN, COMMIT, and ROLLBACK are not yet supported. All language operations are auto-commit. The plan is to support these in a future release.
- Only ORC file format is supported in this first release. The feature has been built such that transactions can be used by any storage format that can determine how updates or deletes apply to base records (basically, that has an explicit or implicit row id), but so far the integration work has only been done for ORC.
- By default transactions are configured to be off. See the Configuration section below for a discussion of which values need to be set to configure it.
- Tables must be bucketed to make use of these features. Tables in the same system not using transactions and ACID do not need to be bucketed. External tables cannot be made ACID tables since the changes on external tables are beyond the control of the compactor.
- Reading/writing to an ACID table from a non-ACID session is not allowed. In other words, the Hive transaction manager must be set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager in order to work with ACID tables.
- At this time only snapshot level isolation is supported. When a given query starts it will be provided with a consistent snapshot of the data. There is no support for dirty read, read committed, repeatable read, or serializable. With the introduction of BEGIN the intention is to support snapshot isolation for the duration of transaction rather than just a single query. Other isolation levels may be added depending on user requests.
- The existing ZooKeeper and in-memory lock managers are not compatible with transactions. There is no intention to address this issue. See Basic Design below for a discussion of how locks are stored for transactions.
Schema changes using ALTER TABLE is NOT supported for ACID tables. HIVE-11421 is tracking it.Fixed in 1.3.0/2.0.0.- Using Oracle as the Metastore DB and "datanucleus.connectionPoolingType=BONECP" may generate intermittent "No such lock.." and "No such transaction..." errors. Setting "datanucleus.connectionPoolingType=DBCP" is recommended in this case.
...
Configuration key | Values | Location | Notes |
Default: org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager Value required for transactions: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager | Client/ | DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions. | |
Default: 300 | Client/ Metastore | Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds. It's critical that this property has the same value for all components/services.5 | |
hive.txn.heartbeat.threadpool.size | Default: 5 | Client/ HiveServer2 | The number of threads to use for heartbeating (as of Hive 1.3.0 and 2.0.0). |
hive.timedout.txn.reaper.start | Default: 100s | Metastore | Time delay of first reaper (the process which aborts timed-out transactions) run after the metastore starts (as of Hive 1.3.0). |
Default: 180s | Metastore | Time interval describing how often the reaper (the process which aborts timed-out transactions) runs (as of Hive 1.3.0). | |
Default: 1000 | Client | Maximum number of transactions that can be fetched in one call to open_txns().1 | |
Default: false Value required for transactions: true (for exactly one instance of the Thrift metastore service) | Metastore | Whether to run the initiator and cleaner threads on this metastore instance. It's critical that this is enabled on exactly one metastore service instance (not enforced yet). As of Hive 1.3.0 this property may be enabled on any number of metastore instances.
| |
Default: 0 Value required for transactions: > 0 on at least one instance of the Thrift metastore service | Metastore | How many compactor worker threads to run on this metastore instance.2 | |
Default: 86400 | Metastore | Time in seconds after which a compaction job will be declared failed and the compaction re-queued. | |
hive.compactor.cleaner.run.interval | Default: 5000 | Metastore | Time in milliseconds between runs of the cleaner thread. (Hive 0.14.0 and later.) |
Default: 300 | Metastore | Time in seconds between checks to see if any tables or partitions need to be compacted.3 | |
Default: 10 | Metastore | Number of delta directories in a table or partition that will trigger a minor compaction. | |
Default: 0.1 | Metastore | Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%. | |
Default: 1000 | Metastore | Number of aborted transactions involving a given table or partition that will trigger a major compaction. | |
Default: 500 | Metastore | Maximum number of delta files that the compactor will attempt to handle in a single job (as of Hive 1.3.0).4 | |
Default: "" (empty string) | Metastore | Used to specify name of Hadoop queue to which Compaction jobs will be submitted. Set to empty string to let Hadoop choose the queue (as of Hive 1.3.0). |
...
If a table is to be used in ACID writes (insert, update, delete) then the table property "transactional=true
" must be set on that table, starting with Hive 0.14.0. Note, once a table has been defined as an ACID table via TBLPROPERTIES ("transactional"="true"), it cannot be converted back to a non-ACID table, i.e., changing TBLPROPERTIES ("transactional"="false") is not allowed. Also, hive.txn.manager must be set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager either in hive-site.xml or in the beginning of the session before any query is run. Without those, inserts will be done in the old style; updates and deletes will be prohibited. However, this does not apply to Hive 0.13.0.
...