HDFS Component
The hdfs component enables you to read and write messages from/to an HDFS file system. HDFS is the distributed file system at the heart of Hadoop.
URI format
hdfs://hostname[:port][/path][?options]
You can append query options to the URI in the following format, ?option=value&option=value&...
The path is treated in the following way:
- as a consumer, if it's a file, it just reads the file, otherwise if it represents a directory it scans all the file under the path satisfying the configured pattern. All the files under that directory must be of the same type.
- as a producer, if at least one split strategy is defined, the path is considered a directory and under that directory the producer creates a different file per split named seg0, seg1, seg2, etc.
Options
Name |
Default Value |
Description |
---|---|---|
|
|
The file can be overwritten |
|
|
The buffer size used by HDFS |
|
|
The HDFS replication factor |
|
|
The size of the HDFS blocks |
|
|
It can be SEQUENCE_FILE, |
|
|
It can be LOCAL for local filesystem |
|
|
The type for the key in case of |
|
|
The type for the key in case of |
|
|
A string describing the strategy on |
|
|
When a file is opened for reading/ |
|
|
Once the file has been read is |
|
|
For the consumer, how much to wait |
|
|
Then interval between the directory |
|
|
The pattern used for scanning the |
|
|
When reading a normal file, this is split |