Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added additional functionality to load data support for partitions and buckets.

...

  • filepath can be:
    • a relative path, such as project/data1
    • an absolute path, such as /user/hive/project/data1
    • a full URI with scheme and (optionally) an authority, such as hdfs://namenode:9000/user/hive/project/data1
  • The target being loaded to can be a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns.
  • filepath can refer to a file (in which case Hive will move the file into the table) or it can be a directory (in which case Hive will move all the files within that directory into the table). In either case, filepath addresses a set of files.
  • If the keyword LOCAL is specified, then:
    • the load command will look for filepath in the local file system. If a relative path is specified, it will be interpreted relative to the user's current working directory. The user can specify a full URI for local files as well - for example: file:///user/hive/project/data1
    • the load command will try to copy all the files addressed by filepath to the target filesystem. The target file system is inferred by looking at the location attribute of the table. The copied data files will then be moved to the table.
  • If the keyword LOCAL is not specified, then Hive will either use the full URI of filepath, if one is specified, or will apply the following rules:
    • If scheme or authority are not specified, Hive will use the scheme and authority from the hadoop configuration variable fs.default.name that specifies the Namenode URI.
    • If the path is not absolute, then Hive will interpret it relative to /user/<username>
    • Hive will move the files addressed by filepath into the table (or partition)
  • If the OVERWRITE keyword is used then the contents of the target table (or partition) will be deleted and replaced by the files referred to by filepath; otherwise the files referred by filepath will be added to the table.

Additional load operations supported Hive 3.0 onwards as it internally rewrite it into an INSERT AS SELECT.

  • If table has partitions, however, the load command does not have them, it would be converted into INSERT AS SELECT and assume that the last set of columns are partition columns. It will throw an error if the file does not conform to the expected schema.
  • If table is bucketed then following rules apply,
    • In strict mode : launches a INSERT AS SELECT job.
    • In non-strict mode : if the file names conform to the naming convention (if the file belongs to bucket 0, it should be named 000000_0 or 000000_0_copy_1 or if it belongs to bucket 2 the names should be like, 000002_0 or 000002_0_copy_3 etc), then it will be a pure copy/move operation, else it will launch a INSERT AS SELECT job.
  • filepath can contain subdirectories, provided each file conforms to the schema.

Example of such schema,

 

Code Block
CREATE TABLE tab1 (col1 int, col2 int) partitioned by (col3 int) stored as ORC;
LOAD DATA LOCAL INPATH 'filepath' INTO TABLE tab1;

 

Here, partition information is missing which would otherwise give error, however, if the file(s) located at filepath conform to the table schema such that each row ends with partition column(s) then the load will rewrite into INSERT AS SELECT job.

The uncompressed data should look like this,

(1,2,3), (2,3,4), (4,5,3) etc.

 

Notes
  • filepath cannot contain subdirectories. (not valid for Hive 3.0 or later)
  • If the keyword LOCAL is not given, filepath must refer to files within the same filesystem as the table's (or partition's) location.
  • Hive does some minimal checks to make sure that the files being loaded match the target table. Currently it checks that if the table is stored in sequencefile format, the files being loaded are also sequencefiles, and vice versa.
  • A bug that prevented loading a file when its name includes the "+" character is fixed in release 0.13.0 (HIVE-6048).
  • Please read CompressedStorage if your datafile is compressed.

...