Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
Standard syntax:
INSERT (OVERWRITE | INTO) TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement 

Hive extension (multiple inserts):
FROM from_statement
INSERT (OVERWRITE | INTO) TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT (OVERWRITE | INTO) TABLE tablename2 [PARTITION ...] select_statement2] ... 

Hive extension (dynamic partition inserts):
INSERT (OVERWRITE | INTO) TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement 
Synopsis
  • INSERT OVERWRITE will overwrite any existing data in the table or partition
  • INSERT INTO will append to the table or partition keeping the existing data in tact.
  • Inserts can be done to a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns.
  • Multiple insert clauses (also known as Multi Table Insert) can be specified in the same query
  • The output of each of the select statements is written to the chosen table (or partition). Currently the OVERWRITE keyword is mandatory and implies that the contents of the chosen table or partition are replaced with the output of corresponding select statement.
  • The output format and serialization class is determined by the table's metadata (as specified via DDL commands on the table)
  • In the dynamic partition inserts, users can give partial partition specification, which means you just specify the list of partition column names in the PARTITION clause. The column values are optional. If a partition column value is given, we call this static partition, otherwise dynamic partition. Each dynamic partition column has a corresponding input column from the select statement. This means that the dynamic partition creation is determined by the value of the input column.
    Notes
  • Multi Table Inserts minimize the number of data scans required. Hive can insert data into multiple tables by scanning the input data just once (and applying different query operators) to the input data.

    Writing data into filesystem from queries

...