Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: edits of Alan's inserts, updates, and deletes (HIVE-7802)

...

Table of Contents

There are multiple ways of modifying to modify data in Hive:

Loading files into tables

...

  • Inserts can be done to a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns.
  • Multiple insert clauses (also known as Multi Table Insert) can be specified in the same query.
  • The output of each of the select statements is written to the chosen table (or partition). Currently the OVERWRITE keyword is mandatory and implies that the contents of the chosen table or partition are replaced with the output of corresponding select statement.
  • The output format and serialization class is determined by the table's metadata (as specified via DDL commands on the table).
  • As of Hive 0.14, if a table has an OutputFormat that implements AcidOutputFormat and the system is configured to use a transaction manager that implements ACID, then INSERT OVERWRITE will be disabled for that table.  This is to avoid users unintentionally overwriting transaction history.  The same functionality can be achieved by using TRUNCATE TABLE (for non-partitioned tables) or DROP PARTITION followed by INSERT INTO.
Notes

...

  • INSERT OVERWRITE statements to directories, local directories, and tables (or partitions) can all be used together within the same query.
  • INSERT OVERWRITE statements to HDFS filesystem directories are the best way to extract large amounts of data from Hive. Hive can write to HDFS directories in parallel from within a map-reduce job.
  • The directory is, as you would expect, OVERWRITten; in other words, if the specified path exists, it is clobbered and replaced with the output.
  • As of Hive 0.11.0 the separator used can be specified, in earlier versions it was always the ^A character (\001)
  • In Hive 0.14, inserts into ACID compliant tables will de-activate deactivate vectorization for the duration of the select and insert.  This will be done automatically.  ACID tables that have data inserted into them can still be queried using vectorization.

...

The INSERT...VALUES statement can be used to insert data into tables directly from SQL.

Note
titleVersion Information

INSERT...VALUES is available starting in Hive 0.14.

Syntax
Code Block
Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] VALUES values_row [, values_row ...]
 
Where values_row is:
( value [, value ...] )
Wherewhere a value is either null or any valid SQL literal

...

  • Each row listed in the VALUES clause is inserted into table tablename.
  • Values must be provided for every column in the table.  The standard SQL syntax that allows the user to insert values into only some columns is not yet supported.  To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to.
  • If the table being inserted into supports ACID and a transaction manager that supports ACID is in use, this operation will be auto-committed upon successful completion.
Examples
Code Block
languagesql
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))

...


  CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;

...



INSERT INTO TABLE students
  VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);

...




 
CREATE TABLE pageviews (userid VARCHAR(64), link STRING, from STRING)
  PARTITIONED BY (datestamp STRING)

...

 CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;

...




INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')
  VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null);

...




INSERT INTO TABLE pageviews PARTITION (datestamp)
  VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');

Update

Note
titleVersion Information

UPDATE is available starting in Hive 0.14.

Updates can only be performed on tables that support ACID. See Hive Transactions for details.

Syntax
Code Block
Standard Syntax:
UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
Synopsis
  • The referenced column must be a column of the table being updated.
  • The value assigned must be an expression that Hive supports in the select clause.  Thus arithmetic operators, UDFs, casts, literals, etc. are supported.  Subqueries are not supported.
  • Only rows that match the WHERE clause will be updated.
  • Partitioning columns cannot be updated.
  • Bucketing columns cannot be updated.
  • In Hive 0.14, upon successful completion of this operation the changes will be auto-committed.

...

Note
titleVersion Information

DELETE is available starting in Hive 0.14.

Deletes can only be performed on tables that support ACID. See Hive Transactions for details.

Syntax
Code Block
Standard Syntax:
DELETE FROM tablename [WHERE expression]

...