Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
Synopsis

Load operations are currently pure copy/move operations that move datafiles into locations corresponding to Hive tables.

...

  • If table has partitions, however, the load command does not have them, it would be converted into INSERT AS SELECT and assume that the last set of columns are partition columns. It will throw an error if the file does not conform to the expected schema.
  • If table is bucketed then following rules apply,
    • In strict mode : launches a INSERT AS SELECT job.
    • In non-strict mode : if the file names conform to the naming convention (if the file belongs to bucket 0, it should be named 000000_0 or 000000_0_copy_1 or if it belongs to bucket 2 the names should be like, 000002_0 or 000002_0_copy_3 etc), then it will be a pure copy/move operation, else it will launch a INSERT AS SELECT job.
  • filepath can contain subdirectories, provided each file conforms to the schema.
  • inputformat can be any Hive input format such as text, orc etc.
  • serde can be the associated hive SERDE.
  • Both inputformat and serde are case sensitive.

Example of such schema,

 

...