Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added Hadoop 2.x specific OSGi usage

...

Note

note that this strategy currently requires either setting an IDLE value or setting the HdfsConstants.HDFS_CLOSE header to false to use the BYTES/MESSAGES configuration...otherwise, the file will be closed with each message

for example:

Code Block
hdfshdfs2://localhost/tmp/simple-file?splitStrategy=IDLE:1000,BYTES:5

...

Using this component in OSGi

This component is fully functional There are some quirks when running this component in an OSGi environment related to the mechanism Hadoop 2.x uses to discover different org.apache.hadoop.fs.FileSystem implementations. Hadoop 2.x uses java.util.ServiceLoader which looks for /META-INF/services/org.apache.hadoop.fs.FileSystem files defining available filesystem types and implementations. These resources are not available when running inside OSGi.

As with camel-hdfs component, the , however, it requires some actions from the user. Hadoop uses the thread context class loader in order to load resources. Usually, the thread context classloader will be the bundle class loader of the bundle that contains the routes. So, the default configuration files need to be visible from the bundle class loader. A typical way to deal with it is to keep a copy of coreof core-default.xml (and e.g., hdfs-default.xml) in your bundle root. That file can be found in the hadoop-common.jar.

Using this component with manually defined routes

There are two options:

  1. Package /META-INF/services/org.apache.hadoop.fs.FileSystem resource with bundle that defines the routes. This resource should list all the required Hadoop 2.x filesystem implementations.
  2. Provide boilerplate initialization code which populates internal, static cache inside org.apache.hadoop.fs.FileSystem class:
Code Block
java
java
org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
conf.setClass("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class, FileSystem.class);
conf.setClass("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class, FileSystem.class);
...
FileSystem.get("file:///", conf);
FileSystem.get("hdfs://localhost:9000/", conf);
...

Using this component with Blueprint container

Two options:

  1. Package /META-INF/services/org.apache.hadoop.fs.FileSystem resource with bundle that contains blueprint definition.
  2. Add the following to the blueprint definition file:
Code Block
java
java
<bean id="hdfsOsgiHelper" class="org.apache.camel.component.hdfs2.HdfsOsgiHelper">
   <argument>
      <map>
         <entry key="file:///" value="org.apache.hadoop.fs.LocalFileSystem"  />
         <entry key="hdfs://localhost:9000/" value="org.apache.hadoop.hdfs.DistributedFileSystem" />
         ...
      </map>
   </argument>
</bean>

<bean id="hdfs2" class="org.apache.camel.component.hdfs2.HdfsComponent" depends-on="hdfsOsgiHelper" />

This way Hadoop 2.x will have correct mapping of URI schemes to filesystem implementations.