Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: minor edits for "Loaading files into tables"

...

Code Block
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
Synopsis

Load operations prior to Hive 3.0 are currently pure copy/move operations that move datafiles into locations corresponding to Hive tables.

...

Additional load operations are supported by Hive 3.0 onwards as it internally rewrite it Hive internally rewrites the load into an INSERT AS SELECT.

  • If table has partitions, however, the load command does not have them, it the load would be converted into INSERT AS SELECT and assume that the last set of columns are partition columns. It will throw an error if the file does not conform to the expected schema.
  • If table is bucketed then the following rules apply,:
    • In strict mode : launches a an INSERT AS SELECT job.
    • In non-strict mode : if the file names conform to the naming convention (if the file belongs to bucket 0, it should be named 000000_0 or 000000_0_copy_1, or if it belongs to bucket 2 the names should be like , 000002_0 or 000002_0_copy_3, etc.) , then it will be a pure copy/move operation, else it will launch a an INSERT AS SELECT job.
  • filepath can contain subdirectories, provided each file conforms to the schema.
  • inputformat can be any Hive input format such as text, ORC, orc etc.
  • serde can be the associated hive Hive SERDE.
  • Both inputformat and serde are case sensitive.

Example of such a schema:

Code Block
CREATE TABLE tab1 (col1 int, col2 int) partitionedPARTITIONED byBY (col3 int) storedSTORED asAS ORC;
LOAD DATA LOCAL INPATH 'filepath' INTO TABLE tab1;

 

Here, partition information is missing which would otherwise give an error, however, if the file(s) located at filepath conform to the table schema such that each row ends with partition column(s) then the load will rewrite into an INSERT AS SELECT job.

The uncompressed data should look like this,:

(1,2,3), (2,3,4), (4,5,3) etc. 

Notes
  • filepath cannot contain subdirectories . (not valid except for Hive 3.0 or later, as described above).
  • If the keyword LOCAL is not given, filepath must refer to files within the same filesystem as the table's (or partition's) location.
  • Hive does some minimal checks to make sure that the files being loaded match the target table. Currently it checks that if the table is stored in sequencefile format, the files being loaded are also sequencefiles, and vice versa.
  • A bug that prevented loading a file when its name includes the "+" character is fixed in release 0.13.0 (HIVE-6048).
  • Please read CompressedStorage if your datafile is compressed.

...