File Component - Camel 2.0 onwards
Using Camel 1.x
This documentation is only for Camel 2.0 or newer. If you are using Camel 1.x then see this [link] instead.
The File component provides access to file systems, allowing files to be processed by any other Camel Components or messages from other components to be saved to disk.
URI format
file:directoryName[?options]
or
file://directoryName[?options]
Where directoryName represents the underlying file directory.
You can append query options to the URI in the following format, ?option=value&option=value&...
Only directories
Camel 2.0 only support endpoints configured with a starting directory. So the directoryName must be a directory.
If you want to consume a single file only, you can use the fileName option, e.g. by setting fileName=thefilename
.
Also, the starting directory must not contain dynamic expressions with ${ } placeholders. Again use the fileName
option to specify the dynamic part of the filename.
In Camel 1.x you could also configure a file and this caused more harm than good as it could lead to confusing situations.
Avoid reading files currently being written by another application
Beware the JDK File IO API is a bit limited in detecting whether another application is currently writing/copying a file. And the implementation can be different depending on OS platform as well. This could lead to that Camel thinks the file is not locked by another process and start consuming it. Therefore you have to do you own investigation what suites your environment. To help with this Camel provides different readLock
options that you can use. See also the section Consuming files from folders where others drop files directly.
URI Options
Common
Name |
Default Value |
Description |
---|---|---|
|
|
Automatically create missing directories in the file's pathname. For the file consumer, that means creating the starting directory. For the file producer, it means the directory to where the files should be written. |
|
128kb |
Write buffer sized in bytes. |
|
|
Use Expression such as File Language to dynamically set the filename. For consumers, it's used as a filename filter. For producers, it's used to evaluate the filename to write. If an expression is set, it take precedence over the |
|
|
Flatten is used to flatten the file name path to strip any leading paths, so it's just the file name. This allows you to consume recursively into sub-directories, but when you eg write the files to another directory they will be written in a single directory. Setting this to |
Consumer only
Name |
Default Value |
Description |
---|---|---|
|
|
Milliseconds before polling the file/directory starts. |
|
|
Milliseconds before the next poll of the file/directory. |
|
|
Set to |
|
|
If a directory, will look for files in all the sub-directories as well. |
|
|
If |
|
|
If |
|
|
Use Expression such as File Language to dynamically set the filename when moving it before processing. For example to move in-progress files into the |
|
|
Use Expression such as File Language to dynamically set the filename when moving it after processing. To move files into a |
|
|
Use Expression such as File Language to dynamically set the filename when moving failed files after processing. To move files into a |
|
|
Is used to include files, if filename matches the regex pattern. |
|
|
Is used to exclude files, if filename matches the regex pattern. |
|
|
Option to use the Idempotent Consumer EIP pattern to let Camel skip already processed files. Will by default use a memory based LRUCache that holds 1000 entries. If |
|
|
Pluggable repository as a org.apache.camel.processor.idempotent.MessageIdRepository class. Will by default use |
|
|
Pluggable in-progress repository as a org.apache.camel.processor.idempotent.MessageIdRepository class. The in-progress repository is used to account the current in progress files being consumed. By default a memory based repository is used. |
|
|
Pluggable filter as a |
|
|
Pluggable sorter as a java.util.Comparator<org.apache.camel.component.file.GenericFile> class. |
|
|
Built-in sort using the File Language. Supports nested sorts, so you can have a sort by file name and as a 2nd group sort by modified date. See sorting section below for details. |
|
|
Used by consumer, to only poll the files if it has exclusive read-lock on the file (i.e. the file is not in-progress or being written). Camel will wait until the file lock is granted.
|
|
Optional timeout in milliseconds for the read-lock, if supported by the read-lock. If the read-lock could not be granted and the timeout triggered, then Camel will skip the file. At next poll Camel, will try the file again, and this time maybe the read-lock could be granted. Use a value of 0 or lower to indicate forever. In Camel 2.0 the default value is 0. In Camel 2.1 the default value is 10000. Currently |
|
|
|
Pluggable read-lock as a |
|
|
A pluggable |
|
|
An integer that defines the maximum number of messages to gather per poll. By default, no maximum is set. Can be used to set a limit of e.g. 1000 to avoid having the server read thousands of files as it starts up. Set a value of 0 or negative to disabled it. |
Default behavior for file consumer
- By default the file is locked for the duration of the processing.
- After the route has completed, files are moved into the
.camel
subdirectory, so that they appear to be deleted. - The File Consumer will always skip any file whose name starts with a dot, such as
.
,.camel
,.m2
or.groovy
. - Only files (not directories) are matched for valid filename, if options such as:
include
orexclude
are used.
Producer only
Name |
Default Value |
Description |
---|---|---|
|
|
What to do if a file already exists with the same name. The following values can be specified: Override, Append, Fail and Ignore. |
|
|
This option is used to write the file using a temporary name and then, after the write is complete, rename it to the real name. Can be used to identify files being written and also avoid consumers (not using exclusive read locks) reading in progress files. Is often used by FTP when uploading big files. |
|
|
Camel 2.1: The same as }.tmp}}. |
Default behavior for file producer
- By default it will override any existing file, if one exist with the same name.
Override is now default
In Camel 1.x the
Append
is the default for the file producer. We have changed this toOverride
in Camel 2.0 as this is also the default file operation usingjava.io.File
.
And also the default for the FTP library we use in the camel-ftp component.
Move and Delete operations
Any move or delete operations is executed after (post command) the routing has completed; so during processing of the Exchange
the file is still located in the inbox folder.
Lets illustrate this with an example:
from("file://inbox?move=.done").to("bean:handleOrder");
When a file is dropped in the inbox
folder, the file consumer notices this and creates a new FileExchange
that is routed to the handleOrder
bean. The bean then processes the File
object. At this point in time the file is still located in the inbox
folder. After the bean completes, and thus the route is completed, the file consumer will perform the move operation and move the file to the .done
sub-folder.
The move and preMove options should be a directory name, which can be either relative or absolute. If relative, the directory is created as a sub-folder from within the folder where the file was consumed.
By default, Camel will move consumed files to the .camel
sub-folder relative to the directory where the file was consumed.
If you want to delete the file after processing, the route should be:
from("file://inobox?delete=true").to("bean:handleOrder");
We have introduced a pre move operation to move files before they are processed. This allows you to mark which files have been scanned as they are moved to this sub folder before being processed.
from("file://inbox?preMove=inprogress").to("bean:handleOrder");
You can combine the pre move and the regular move:
from("file://inbox?preMove=inprogress&move=.done").to("bean:handleOrder");
So in this situation, the file is in the inprogress
folder when being processed and after it's processed, it's moved to the .done
folder.
Fine grained control over Move and PreMove option
The move and preMove option is Expression-based, so we have the full power of the File Language to do advanced configuration of the directory and name pattern.
Camel will, in fact, internally convert the directory name you enter into a File Language expression. So when we enter move=.done
Camel will convert this into: ${file:parent}/.done/${file:onlyname
}. This is only done if Camel detects that you have not provided a ${ } in the option value yourself. So when you enter a ${ } Camel will not convert it and thus you have the full power.
So if we want to move the file into a backup folder with today's date as the pattern, we can do:
move=backup/${date:now:yyyyMMdd}/${file:name}
About moveFailed
The moveFailed
option allows you to move files that could not be processed succesfully to another location such as a error folder of your choice. For example to move the files in an error folder with a timestamp you can use moveFailed=/error/${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:name.ext
}.
See more examples at File Language
Message Headers
The following headers are supported by this component:
File producer only
Header |
Description |
---|---|
|
Specifies the name of the file to write (relative to the endpoint directory). The name can be a |
File consumer only
Header |
Description |
---|---|
|
Name of the consumed file as a relative file path with offset from the starting directory configured on the endpoint. |
|
Only the file name (the name with no leading paths). |
|
The actual absolute filepath (path + name) for the output file that was written. This header is set by Camel and its purpose is providing end-users with the name of the file that was written. |
|
A |
|
The absolute path to the file. For relative files this path holds the relative path instead. |
|
The file path. For relative files this is the starting directory + the relative filename. For absolute files this is the absolute path. |
|
The relative path. |
|
The parent path. |
|
A |
|
A |
Batch Consumer
This component implements the Batch Consumer.
Exchange Properties, file consumer only
As the file consumer is BatchConsumer
it supports batching the files it polls. By batching it means that Camel will add some properties to the Exchange so you know the number of files polled the current index in that order.
Property |
Description |
---|---|
|
The total number of files that was polled in this batch. |
|
The current index of the batch. Starts from 0. |
|
A |
This allows you for instance to know how many files exists in this batch and for instance let the Aggregator aggregate this number of files.
Common gotchas with folder and filenames
When Camel is producing files (writing files) there are a few gotchas affecting how to set a filename of your choice. By default, Camel will use the message ID as the filename, and since the message ID is normally a unique generated ID, you will end up with filenames such as: ID-MACHINENAME-2443-1211718892437-1-0
. If such a filename is not desired, then you must provide a filename in the CamelFileName
message header. The constant, Exchange.FILE_NAME
, can also be used.
The sample code below produces files using the message ID as the filename:
from("direct:report").to("file:target/reports");
To use report.txt
as the filename you have to do:
from("direct:report").setHeader(Exchange.FILE_NAME, constant("report.txt")).to( "file:target/reports");
... the same as above, but with CamelFileName
:
from("direct:report").setHeader("CamelFileName", constant("report.txt")).to( "file:target/reports");
And a syntax where we set the filename on the endpoint with the fileName URI option.
from("direct:report").to("file:target/reports/?fileName=report.txt");
Filename Expression
Filename can be set either using the expression option or as a string-based File Language expression in the CamelFileName
header. See the File Language for syntax and samples.
Consuming files from folders where others drop files directly
Beware if you consume files from a folder where other applications write files directly. Take a look at the different readLock
options to see what suits your use cases. The best approach is however to write to another folder and after the write move the file in the drop folder. However if you write files directly to the drop folder then the option changed
could better detect whether a file is currently being written/copied as it uses a file changed algorithm to see whether the file size / modification changes over a period of time. The other read lock options rely on Java File API that sadly is not always very good at detecting this.
Samples
Read from a directory and write to another directory
from("file://inputdir/?delete=true").to("file://outputdir")
Listen on a directory and create a message for each file dropped there. Copy the contents to the outputdir
and delete the file in the inputdir
.
Reading recursive from a directory and write the another
from("file://inputdir/?recursive=true&delete=true").to("file://outputdir")
Listen on a directory and create a message for each file dropped there. Copy the contents to the outputdir
and delete the file in the inputdir
. Will scan recursively into sub-directories. Will lay out the files in the same directory structure in the outputdir
as the inputdir
, including any sub-directories.
inputdir/foo.txt inputdir/sub/bar.txt
Will result in the following output layout:
outputdir/foo.txt outputdir/sub/bar.txt
Using flatten
If you want to store the files in the outputdir directory in the same directory, disregarding the source directory layout (e.g. to flatten out the path), you just add the flatten=true
option on the file producer side:
from("file://inputdir/?recursive=true&delete=true").to("file://outputdir?flatten=true")
Will result in the following output layout:
outputdir/foo.txt outputdir/bar.txt
Reading from a directory and the default move operation
Camel will by default move any processed file into a .camel
subdirectory in the directory the file was consumed from.
from("file://inputdir/?recursive=true&delete=true").to("file://outputdir")
Affects the layout as follows:
before
inputdir/foo.txt inputdir/sub/bar.txt
after
inputdir/.camel/foo.txt inputdir/sub/.camel/bar.txt outputdir/foo.txt outputdir/sub/bar.txt
Read from a directory and process the message in java
from("file://inputdir/").process(new Processor() { public void process(Exchange exchange) throws Exception { Object body = exchange.getIn().getBody(); // do some business logic with the input body } });
The body will be a File
object that points to the file that was just dropped into the inputdir
directory.
Read files from a directory and send the content to a jms queue
from("file://inputdir/").convertBodyTo(String.class).to("jms:test.queue")
By default the file endpoint sends a FileMessage
which contains a File
object as the body. If you send this directly to the JMS component the JMS message will only contain the File
object but not the content. By converting the File
to a String
, the message will contain the file contents what is probably what you want.
The route above using Spring DSL:
<route> <from uri="file://inputdir/"/> <convertBodyTo type="java.lang.String"/> <to uri="jms:test.queue"/> </route>
Writing to files
Camel is of course also able to write files, i.e. produce files. In the sample below we receive some reports on the SEDA queue that we processes before they are written to a directory.
Write to subdirectory using Exchange.FILE_NAME
Using a single route, it is possible to write a file to any number of subdirectories. If you have a route setup as such:
<route> <from uri="bean:myBean"/> <to uri="file:/rootDirectory"/> </route>
You can have myBean
set the header Exchange.FILE_NAME
to values such as:
Exchange.FILE_NAME = hello.txt => /rootDirectory/hello.txt Exchange.FILE_NAME = foo/bye.txt => /rootDirectory/foo/bye.txt
This allows you to have a single route to write files to multiple destinations.
Using expression for filenames
In this sample we want to move consumed files to a backup folder using today's date as a sub-folder name:
from("file://inbox?move=backup/${date:now:yyyyMMdd}/${file:name}").to("...");
See File Language for more samples.
Avoiding reading the same file more than once (idempotent consumer)
Camel supports Idempotent Consumer directly within the component so it will skip already processed files. This feature can be enabled by setting the idempotent=true
option.
from("file://inbox?idempotent=true").to("...");
By default Camel uses a in memory based store for keeping track of consumed files, it uses a least recently used cache storing holding up to 1000 entries. You can plugin your own implementation of this store by using the idempotentRepository
option using the #
sign in the value to indicate it's a referring to a bean in the Registry with the specified id
.
<!-- define our store as a plain spring bean --> <bean id="myStore" class="com.mycompany.MyIdempotentStore"/> <route> <from uri="file://inbox?idempotent=true&idempotentRepository=#myStore"/> <to uri="bean:processInbox"/> </route>
Camel will log at DEBUG
level if it skips a file because it has been consumed before:
DEBUG FileConsumer is idempotent and the file has been consumed before. Will skip this file: target\idempotent\report.txt
Using a file based idempotent repository
In this section we will use the file based idempotent repository org.apache.camel.processor.idempotent.FileIdempotentRepository
instead of the in-memory based that is used as default.
This repository uses a 1st level cache to avoid reading the file repository. It will only use the file repository to store the content of the 1st level cache. Thereby the repository can survive server restarts. It will load the content of the file into the 1st level cache upon startup. The file structure is very simple as it store the key in separate lines in the file. By default, the file store has a size limit of 1mb when the file grew larger Camel will truncate the file store be rebuilding the content by flushing the 1st level cache in a fresh empty file.
We configure our repository using Spring XML creating our file idempotent repository and define our file consumer to use our repository with the idempotentRepository
using #
sign to indicate Registry lookup:
Using a JPA based idempotent repository
In this section we will use the JPA based idempotent repository instead of the in-memory based that is used as default.
First we need a persistence-unit in META-INF/persistence.xml
where we need to use the class org.apache.camel.processor.idempotent.jpa.MessageProcessed
as model.
Then we need to setup a Spring jpaTemplate
in the spring XML file:
And finally we can create our JPA idempotent repository in the spring XML file as well:
And yes then we just need to refer to the jpaStore bean in the file consumer endpoint using the [[idempotentRepository}} using the #
syntax option:
<route> <from uri="file://inbox?idempotent=true&idempotentRepository=#jpaStore"/> <to uri="bean:processInbox"/> </route>
Filter using org.apache.camel.component.file.GenericFileFilter
Camel supports pluggable filtering strategies. You can then configure the endpoint with such a filter to skip certain files being processed.
In the sample we have build our own filter that skips files starting with skip
in the filename:
And then we can configure our route using the filter attribute to reference our filter (using #
notation) that we have defines in the spring XML file:
<!-- define our sorter as a plain spring bean --> <bean id="myFilter" class="com.mycompany.MyFileSorter"/> <route> <from uri="file://inbox?filter=#myFilter"/> <to uri="bean:processInbox"/> </route>
Filtering using ANT path matcher
The ANT path matcher is shipped out-of-the-box in the camel-spring jar. So you need to depend on camel-spring if you are using Maven.
The reasons is that we leverage Spring's AntPathMatcher to do the actual matching.
The file paths is matched with the following rules:
?
matches one character*
matches zero or more characters**
matches zero or more directories in a path
The sample below demonstrates how to use it:
Sorting using Comparator
Camel supports pluggable sorting strategies. This strategy it to use the build in java.util.Comparator
in Java. You can then configure the endpoint with such a comparator and have Camel sort the files before being processed.
In the sample we have built our own comparator that just sorts by file name:
And then we can configure our route using the sorter option to reference to our sorter (mySorter
) we have defined in the spring XML file:
<!-- define our sorter as a plain spring bean --> <bean id="mySorter" class="com.mycompany.MyFileSorter"/> <route> <from uri="file://inbox?sorter=#mySorter"/> <to uri="bean:processInbox"/> </route>
URI options can reference beans using the # syntax
Sorting using sortBy
Camel supports pluggable sorting strategies. This strategy it to use the File Language to configure the sorting. The sortBy
option is configured as follows:
sortBy=group 1;group 2;group 3;...
Where each group is separated with semi colon. In the simple situations you just use one group, so a simple example could be:
sortBy=file:name
This will sort by file name, you can reverse the order by prefixing reverse:
to the group, so the sorting is now Z..A:
sortBy=reverse:file:name
As we have the full power of File Language we can use some of the other parameters, so if we want to sort by file size we do:
sortBy=file:size
You can configure to ignore the case, using ignoreCase:
for string comparison, so if you want to use file name sorting but to ignore the case then we do:
sortBy=ignoreCase:file:name
You can combine ignore case and reverse, however reverse must be specified first:
sortBy=reverse:ignoreCase:file:name
In the sample below we want to sort by last modified file, so we do:
sortBy=file:modifed
And then we want to group by name as a 2nd option so files with same modifcation is sorted by name:
sortBy=file:modifed;file:name
Now there is an issue here, can you spot it? Well the modified timestamp of the file is too fine as it will be in milliseconds, but what if we want to sort by date only and then subgroup by name?
Well as we have the true power of File Language we can use the its date command that supports patterns. So this can be solved as:
sortBy=date:file:yyyyMMdd;file:name
Yeah, that is pretty powerful, oh by the way you can also use reverse per group, so we could reverse the file names:
sortBy=date:file:yyyyMMdd;reverse:file:name
Using GenericFileProcessStrategy
The option processStrategy
can be used to use a custom GenericFileProcessStrategy
that allows you to implement your own begin, commit and rollback logic.
For instance lets assume a system writes a file in a folder you should consume. But you should not start consuming the file before another ready file have been written as well.
So by implementing our own GenericFileProcessStrategy
we can implement this as:
- In the
begin()
method we can test whether the special ready file exists. The begin method returns aboolean
to indicate if we can consume the file or not. - in the
commit()
method we can move the actual file and also delete the ready file.
Debug logging
This component has log level TRACE that can be helpful if you have problems.