Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • INSERT OVERWRITE statements to directories, local directories, and tables (or partitions) can all be used together within the same query.
  • INSERT OVERWRITE statements to HDFS filesystem directories are the best way to extract large amounts of data from Hive. Hive can write to HDFS directories in parallel from within a map-reduce job.
  • The directory is, as you would expect, OVERWRITten; in other words, if the specified path exists, it is clobbered and replaced with the output.
  • As of Hive 0.11.0 the separator used can be specified, in earlier versions it was always the ^A character (\001)
  • In Hive 0.14, inserts into ACID compliant tables will de-activate vectorization for the duration of the select and insert.  This will be done automatically.  ACID tables that have data inserted into them can still be queried using vectorization.

Inserting into tables from SQL

...

  • Each row listed in the VALUES clause is inserted into table tablename.
  • Values must be provided for every column in the table.  The standard SQL syntax that allows the user to insert values into only some columns is not yet supported.  To mimic the standard SQL nulls can be provided for columns the user does not wish to assign a value to.
  • If the table being inserted into supports ACID and a transaction manager that supports ACID is in use, this operation will be auto-committed upon successful completion.
Examples

CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2)) CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;

INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);

CREATE TABLE pageviews (userid VARCHAR(64), link STRING, from STRING) PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;

INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23') VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null);

INSERT INTO TABLE pageviews PARTITION (datestamp) VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');

Update

Note

UPDATE is available starting in Hive 0.14.

Updates can only be performed on tables that support ACID. See Hive Transactions for details.

Code Block
Standard Syntax:
UPDATE tablename SET column = value [, column = value...] [WHERE expression]
Synopsis
  • The referenced column must be a column of the table being updated.
  • The value assigned must be an expression that Hive supports in the select clause.  Thus arithmetic operators, UDFs, casts, literals, etc. supported.  Subqueries are not supported.
  • Only rows that match the WHERE clause will be updated.
  • Partitioning columns cannot be updated.
  • Bucketing columns cannot be updated.
  • In Hive 0.14, upon successful completion of this operation the changes will be auto-committed.
Notes
  • Vectorization will be turned off for update operations.  This is automatic and requires no action on the part of the user.  Non-update operations are not affected.  Updated tables can still be queried using vectorization.