Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Maven users will need to add the following dependency to their pom.xml for this component:

Code Block
xml
xml

<dependency>
    <groupId>org.apache.camel</groupId>
    <artifactId>camel-hdfs</artifactId>
    <version>x.x.x</version>
    <!-- use the same version as your Camel core version -->
</dependency>

URI format

Code Block

hdfs://hostname[:port][/path][?options]

...

  1. as a consumer, if it's a file, it just reads the file, otherwise if it represents a directory it scans all the file under the path satisfying the configured pattern. All the files under that directory must be of the same type.
  2. as a producer, if at least one split strategy is defined, the path is considered a directory and under that directory the producer creates a different file per split named using the configured uuidgenerator UuidGenerator.

Options

Wiki Markup
{div:class=confluenceTableSmall}
|| Name || Default Value || Description ||
| {{overwrite}} | {{true}} | The file can be overwritten |
| {{append}} | {{false}} | Append to existing file. Notice that not all HDFS file systems support the append option. |
| {{bufferSize}} | {{4096}} | The buffer size used by HDFS  |
| {{replication}} | {{3}} | The HDFS replication factor  |
| {{blockSize}} | {{67108864}} | The size of the HDFS blocks  |
| {{fileType}} | {{NORMAL_FILE}} | It can be SEQUENCE_FILE, MAP_FILE, ARRAY_FILE, or BLOOMMAP_FILE, see Hadoop |
| {{fileSystemType}} | {{HDFS}} | It can be LOCAL for local filesystem  |
| {{keyType}} | {{NULL}} | The type for the key in case of sequence or map files. See below.  |
| {{valueType}} | {{TEXT}} | The type for the key in case of sequence or map files. See below.  |
| {{splitStrategy}} | | A string describing the strategy on how to split the file based on different criteria. See below.  |
| {{openedSuffix}} | {{opened}} | When a file is opened for reading/writing the file is renamed with this suffix to avoid to read it during the writing phase. |
| {{readSuffix}} | {{read}} | Once the file has been read is renamed with this suffix to avoid to read it again.  |
| {{initialDelay}} | {{0}} | For the consumer, how much to wait (milliseconds) before to start scanning the directory.  |
| {{delay}} | {{0}} | The interval (milliseconds) between the directory scans. |
| {{pattern}} | {{*}} | The pattern used for scanning the directory  |
| {{chunkSize}} | {{4096}} | When reading a normal file, this is split into chunks producing a message per chunk. |
| {{connectOnStartup}} | {{true}} | *Camel 2.9.3/2.10.1:* Whether to connect to the HDFS file system on starting the producer/consumer. If {{false}} then the connection is created on-demand. Notice that HDFS may take up till 15 minutes to establish a connection, as it has hardcoded 45 x 20 sec redelivery. By setting this option to {{false}} allows your application to startup, and not block for up till 15 minutes. |
| {{owner}} | | *Camel 2.13/2.12.4:* The file owner must match this owner for the consumer to pickup the file. Otherwise the file is skipped. |
{div}

KeyType and ValueType

...

  • If the split strategy option has been defined, the hdfs path will be used as a directory and files will be created using the configured uuidgenerator UuidGenerator
  • Every time a splitting condition is met, a new file is created.
    The splitStrategy option is defined as a string with the following syntax:
    splitStrategy=<ST>:<value>,<ST>:<value>,*

...

Note

note that this strategy currently requires either setting an IDLE value or setting the HdfsConstants.HDFS_CLOSE header to false to use the BYTES/MESSAGES configuration...otherwise, the file will be closed with each message

for example:

Code Block

hdfs://localhost/tmp/simple-file?splitStrategy=IDLE:1000,BYTES:5

...