...
HDFS
...
Component
...
Available
...
as
...
of
...
Camel
...
2.8
...
The
...
hdfs
...
component
...
enables
...
you
...
to
...
read
...
and
...
write
...
messages
...
from/to
...
an
...
HDFS
...
file
...
system.
...
HDFS
...
is
...
the
...
distributed
...
file
...
system
...
at
...
the
...
heart
...
of
...
Maven users will need to add the following dependency to their pom.xml
for this component:
Code Block | ||||
---|---|---|---|---|
| ||||
|http://hadoop.apache.org]. Maven users will need to add the following dependency to their {{pom.xml}} for this component: {code:xml} <dependency> <groupId>org.apache.camel</groupId> <artifactId>camel-hdfs</artifactId> <version>x.x.x</version> <!-- use the same version as your Camel core version --> </dependency> {code} h3. URI format {code} |
URI format
Code Block |
---|
hdfs://hostname[:port][/path][?options]
{code}
|
You
...
can
...
append
...
query
...
options
...
to
...
the
...
URI
...
in
...
the
...
following
...
format,
...
?option=value&option=value&...
...
The
...
path
...
is
...
treated
...
in
...
the
...
following
...
way:
...
- as
...
- a
...
- consumer,
...
- if
...
- it's
...
- a
...
- file,
...
- it
...
- just
...
- reads
...
- the
...
- file,
...
- otherwise
...
- if
...
- it
...
- represents
...
- a
...
- directory
...
- it
...
- scans
...
- all
...
- the
...
- file
...
- under
...
- the
...
- path
...
- satisfying
...
- the
...
- configured
...
- pattern.
...
- All
...
- the
...
- files
...
- under
...
- that
...
- directory
...
- must
...
- be
...
- of
...
- the
...
- same
...
- type.
...
- as
...
- a
...
- producer,
...
- if
...
- at
...
- least
...
- one
...
- split
...
- strategy
...
- is
...
- defined,
...
- the
...
- path
...
- is
...
- considered
...
- a
...
- directory
...
- and
...
- under
...
- that
...
- directory
...
- the
...
- producer
...
- creates
...
- a
...
- different
...
- file
...
- per
...
- split
...
- named
...
- using
...
- the configured UuidGenerator.
Note |
---|
When consuming from hdfs then in normal mode, a file is split into chunks, producing a message per chunk. You can configure the size of the chunk using the chunkSize option. If you want to read from hdfs and write to a regular file using the file component, then you can use the fileMode=Append to append each of the chunks together. |
Options
Div | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
KeyType and ValueType
- NULL it means that the key or the value is absent
- BYTE for writing a byte, the java Byte class is mapped into a BYTE
- BYTES for writing a sequence of bytes. It maps the java ByteBuffer class
- INT for writing java integer
- FLOAT for writing java float
- LONG for writing java long
- DOUBLE for writing java double
- TEXT for writing java strings
BYTES is also used with everything else, for example, in Camel a file is sent around as an InputStream, int this case is written in a sequence file or a map file as a sequence of bytes.
Splitting Strategy
In the current version of Hadoop opening a file in append mode is disabled since it's not very reliable. So, for the moment, it's only possible to create new files. The Camel HDFS endpoint tries to solve this problem in this way:
- If the split strategy option has been defined, the hdfs path will be used as a directory and files will be created using the configured UuidGenerator
- Every time a splitting condition is met, a new file is created.
The splitStrategy option is defined as a string with the following syntax:
splitStrategy=<ST>:<value>,<ST>:<value>,*
...
where
...
<ST>
...
can
...
be:
...
- BYTES
...
- a
...
- new
...
- file
...
- is
...
- created,
...
- and
...
- the
...
- old
...
- is
...
- closed
...
- when
...
- the
...
- number
...
- of
...
- written
...
- bytes
...
- is
...
- more
...
- than
...
- <value>
...
- MESSAGES
...
- a
...
- new
...
- file
...
- is
...
- created,
...
- and
...
- the
...
- old
...
- is
...
- closed
...
- when
...
- the
...
- number
...
- of
...
- written
...
- messages
...
- is
...
- more
...
- than
...
- <value>
...
- IDLE
...
- a
...
- new
...
- file
...
- is
...
- created,
...
- and
...
- the
...
- old
...
- is
...
- closed
...
- when
...
- no
...
- writing
...
- happened
...
- in
...
- the
...
- last
...
- <value>
...
- milliseconds
Note |
---|
note that this strategy currently requires either setting an IDLE value or setting the {note} note that this strategy currently requires either setting an IDLE value or setting theHdfsConstants.HDFS_CLOSE header to false to use the BYTES/MESSAGES configuration...otherwise, the file will be closed with each message |
for example:
Code Block |
---|
message {note} for example: {code} hdfs://localhost/tmp/simple-file?splitStrategy=IDLE:1000,BYTES:5 {code} |
it
...
means:
...
a
...
new
...
file
...
is
...
created
...
either
...
when
...
it
...
has
...
been
...
idle
...
for
...
more
...
than
...
1
...
second
...
or
...
if
...
more
...
than
...
5
...
bytes
...
have
...
been
...
written.
...
So,
...
running
...
hadoop
...
fs
...
-ls
...
/tmp/simple-file
...
you'll
...
see
...
that
...
multiple
...
files
...
have
...
been
...
created.
...
Message
...
Headers
...
The
...
following
...
headers
...
are
...
supported
...
by
...
this
...
component:
...
Producer
...
only
Div | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||
|
Controlling to close file stream
Available as of Camel 2.10.4
...
When
...
using
...
the
...
...
producer
...
without
...
a
...
split
...
strategy,
...
then
...
the
...
file
...
output
...
stream
...
is
...
by
...
default
...
closed
...
after
...
the
...
write.
...
However
...
you
...
may
...
want
...
to
...
keep
...
the
...
stream
...
open,
...
and
...
only
...
explicitly
...
close
...
the
...
stream
...
later.
...
For
...
that
...
you
...
can
...
use
...
the
...
header
...
HdfsConstants.HDFS_CLOSE
...
(value
...
=
...
"CamelHdfsClose"
...
)
...
to
...
control
...
this.
...
Setting
...
this
...
value
...
to
...
a
...
boolean
...
allows
...
you
...
to
...
explicit
...
control
...
whether
...
the
...
stream
...
should
...
be
...
closed
...
or
...
not.
...
Notice
...
this
...
does
...
not
...
apply
...
if
...
you
...
use
...
a
...
split
...
strategy,
...
as
...
there
...
are
...
various
...
strategies
...
that
...
can
...
control
...
when
...
the
...
stream
...
is
...
closed.
...
Using this component in OSGi
This component is fully functional in an OSGi environment, however, it requires some actions from the user. Hadoop uses the thread context class loader in order to load resources. Usually, the thread context classloader will be the bundle class loader of the bundle that contains the routes. So, the default configuration files need to be visible from the bundle class loader. A typical way to deal with it is to keep a copy of core-default.xml in your bundle root. That file can be found in the hadoop-common.jar.