Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Inserts can be done to a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns.
  • Multiple insert clauses (also known as Multi Table Insert) can be specified in the same query
  • The output of each of the select statements is written to the chosen table (or partition). Currently the OVERWRITE keyword is mandatory and implies that the contents of the chosen table or partition are replaced with the output of corresponding select statement.
  • The output format and serialization class is determined by the table's metadata (as specified via DDL commands on the table)
  • In the dynamic partition inserts, users can give partial partition specification, which means you just specify the list of partition column names in the PARTITION clause. The column values are optional. If a partition column value is given, we call this static partition, otherwise dynamic partition. Each dynamic partition column has a corresponding input column from the select statement. This means that the dynamic partition creation is determined by the value of the input column. The dynamic partition columns must be specified last among the columns in the select statement and in the order of the directory hierarchy in which the table data are stored.
Notes
  • Multi Table Inserts minimize the number of data scans required. Hive can insert data into multiple tables by scanning the input data just once (and applying different query operators) to the input data.

...