...
managed by the framework.
Content Repository
The Content Repository is responsible for storing the content of FlowFiles and providing mechanisms for reading the contents
of a FlowFile. This abstraction allows the contents of FlowFiles to be stored independently and efficiently based on the underlying
storage mechanism. The default implementation is the FileSystemRepository, which persists all data to the underlying file system.
Note: While the Content Repository is pluggable, it is considered a 'private API' and its interface could potentially be changed between
minor versions of NiFi. It is, therefore, not recommended that implementations be developed outside of the NiFi codebase.
FlowFile Repository
The FlowFile Repository is responsible for storing the FlowFiles' attributes and state, such as creation time and which FlowFile Queue
the FlowFile belongs in. The default implementation is the WriteAheadFlowFileRepository, which persists the information to a write-ahead
log that is periodically "checkpointed". This allows extremely high transaction rates, as the files that it writes to are "append-only," so the
OutputStreams are able to be kept open. Periodically, the repository will checkpoint, meaning that it will begin writing to new write-ahead logs,
write out the state of all FlowFiles at that point in time, and delete the old write-ahead logs. This prevents the write-ahead logs from growing
indefinitely.
Note: While the FlowFile Repository is pluggable, it is considered a 'private API' and its interface could potentially be changed between
minor versions of NiFi. It is, therefore, not recommended that implementations be developed outside of the NiFi codebase.
Provenance Repository
The Provenance Repository is responsible for storing, retrieving, and querying all Data Provenance Events. Each time that a FlowFile is
received, routed, cloned, forked, modified, sent, or dropped, a Provenance Event is generated that details this information. The event contains
information about what the Event Type was, which FlowFile(s) were involved, the FlowFile's attributes at the time of the event, details about the
event, and a pointer to the Content of the FlowFile before and after the event occurred (which allows a user to understand how that particular
event modified the FlowFile).
The Provenance Repository allows this information to be stored about each FlowFile as it traverses through the system and provides a mechanism
for assembling a "Lineage view" of a FlowFile, so that a graphical representation can be shown of exactly how the FlowFile was handled. In order
to determine which lineages to view, the repository exposes a mechanism whereby a user is able to search the events and associated FlowFile
attributes.
The default implementation is PersistentProvenanceRepository. This repository stores all data immediately to disk-backed write-ahead log and
periodically "rolls over" the data, indexing and compressing the data. The search capabilities are provided by an embedded Lucene engine.
Note: While the Provenance Repository is pluggable, it is considered a 'private API' and its interface could potentially be changed between
minor versions of NiFi. It is, therefore, not recommended that implementations be developed outside of the NiFi codebase.
Process Session
The Process Session (often referred to simply as a "session") provides Processors access to FlowFiles and provides transactional behavior across
the tasks that are performed by a Processor. The session provides get()
methods for obtaining access to FlowFiles that are queued up for a Processor,
methods to read from and write to the contents of a FlowFile, add and remove FlowFiles from the flow, add and remove attributes from a FlowFile,
and route a FlowFile to a particular relationship. Additionally, the session provides access to the ProvenanceReporter that is used by Processors to
emit Provenance Events.
Once a Processor is finished performing its task, the Processor has the ability to either commit or rollback the session. If a Processor rolls back the session,
the FlowFiles that were accessed during that session will all be reverted to their previous states. Any FlowFile that was added to the flow will be destroyed.
Any FlowFile that was removed from the flow will be re-queued in the same queue that it was pulled from. Any FlowFile that was modified will have both its
contents and attributes reverted to their previous values, and the FlowFiles will all be re-queued into the FlowFile Queue that they were pulled from. Additionally,
any Provenance Events will be discarded.
If a Processor instead chooses to commit the session, the session is responsible for updating the FlowFile Repository and Provenance Repository with
the relevant information. The session will then add the FlowFiles to the Processor's outbound queues (cloning as necessary, if the FlowFile was transferred to
a relationship for which multiple connections have been established).
Process Context
The Process Context provides a bridge between a Processor and its associated Processor Node. It provides information about about the Processor's current
configuration, as well as the ability to "yield," or signal to the framework that it is unable to perform any work for a short period of time so the framework should not
not waste resources scheduling the Processor to run. The Process Context also provides mechanisms for accessing the Controller Services that are available,
so that Processors are able to take advantage of shared logic or shared resources.
Reporting Task
A Reporting Task is a NiFi extension point that is capable of reporting and analyzing NiFi's internal metrics in order to provide the information to external
resources or report status information as bulletins that appear directly in the NiFi User Interface. Unlike a Processor, a Reporting Task does not have access
to individual FlowFiles. Rather, a Reporting Task has access to all Provenance Events, bulletins, and the metrics shown for components on the graph, such
as FlowFiles In, Bytes Read, and Bytes Written.
The Reporting Task is an extension point, and its API will not change from one minor release of NiFi to another but may change with a new
major release of NiFi.
Controller Service
The Controller Service is a mechanism that allows state or resources to be shared across multiple components in the flow. The SSLContextService, for instance,
allows a user to configure SSL information only once and then configure any number of resources to use that configuration. Other Controller Services are used to
share configuration. For example, if a very large dataset needs to be loaded, it will generally make sense to use a Controller Service to load the dataset. This allows
multiple Processors to make use of this dataset without having to load the dataset multiple times.
The Controller Service is an extension point, and its API will not change from one minor release of NiFi to another but may change with a new
Reporting Task
A Reporting Task is a NiFi extension point that is capable of reporting and analyzing NiFi's internal metrics in order to provide the information to external
resources or report status information as bulletins that appear directly in the NiFi User Interface. Unlike a Processor, a Reporting Task does not have access
to individual FlowFiles. Rather, a Reporting Task has access to all Provenance Events, bulletins, and the metrics shown for components on the graph, such
as FlowFiles In, Bytes Read, and Bytes Written.
The Reporting Task is an extension point, and its API will not change from one minor release of NiFi to another but may change with a new
major release of NiFi.
Controller Service
The Controller Service is a mechanism that allows state or resources to be shared across multiple components in the flow. The SSLContextService, for instance,
allows a user to configure SSL information only once and then configure any number of resources to use that configuration. Other Controller Services are used to
share configuration. For example, if a very large dataset needs to be loaded, it will generally make sense to use a Controller Service to load the dataset. This allows
multiple Processors to make use of this dataset without having to load the dataset multiple times.
The Controller Service is an extension point, and its API will not change from one minor release of NiFi to another but may change with a new
major release of NiFi.
Process Session
The Process Session (often referred to simply as a "session") provides Processors access to FlowFiles and provides transactional behavior across
the tasks that are performed by a Processor. The session provides get()
methods for obtaining access to FlowFiles that are queued up for a Processor,
methods to read from and write to the contents of a FlowFile, add and remove FlowFiles from the flow, add and remove attributes from a FlowFile,
and route a FlowFile to a particular relationship. Additionally, the session provides access to the ProvenanceReporter that is used by Processors to
emit Provenance Events.
Once a Processor is finished performing its task, the Processor has the ability to either commit or rollback the session. If a Processor rolls back the session,
the FlowFiles that were accessed during that session will all be reverted to their previous states. Any FlowFile that was added to the flow will be destroyed.
Any FlowFile that was removed from the flow will be re-queued in the same queue that it was pulled from. Any FlowFile that was modified will have both its
contents and attributes reverted to their previous values, and the FlowFiles will all be re-queued into the FlowFile Queue that they were pulled from. Additionally,
any Provenance Events will be discarded.
If a Processor instead chooses to commit the session, the session is responsible for updating the FlowFile Repository and Provenance Repository with
the relevant information. The session will then add the FlowFiles to the Processor's outbound queues (cloning as necessary, if the FlowFile was transferred to
a relationship for which multiple connections have been established).
Process Context
The Process Context provides a bridge between a Processor and its associated Processor Node. It provides information about about the Processor's current
configuration, as well as the ability to "yield," or signal to the framework that it is unable to perform any work for a short period of time so the framework should not
not waste resources scheduling the Processor to run. The Process Context also provides mechanisms for accessing the Controller Services that are available,
so that Processors are able to take advantage of shared logic or shared resources.
FlowFile Repository
The FlowFile Repository is responsible for storing the FlowFiles' attributes and state, such as creation time and which FlowFile Queue
the FlowFile belongs in. The default implementation is the WriteAheadFlowFileRepository, which persists the information to a write-ahead
log that is periodically "checkpointed". This allows extremely high transaction rates, as the files that it writes to are "append-only," so the
OutputStreams are able to be kept open. Periodically, the repository will checkpoint, meaning that it will begin writing to new write-ahead logs,
write out the state of all FlowFiles at that point in time, and delete the old write-ahead logs. This prevents the write-ahead logs from growing
indefinitely.
Note: While the FlowFile Repository is pluggable, it is considered a 'private API' and its interface could potentially be changed between
minor versions of NiFi. It is, therefore, not recommended that implementations be developed outside of the NiFi codebase.
Content Repository
The Content Repository is responsible for storing the content of FlowFiles and providing mechanisms for reading the contents
of a FlowFile. This abstraction allows the contents of FlowFiles to be stored independently and efficiently based on the underlying
storage mechanism. The default implementation is the FileSystemRepository, which persists all data to the underlying file system.
Note: While the Content Repository is pluggable, it is considered a 'private API' and its interface could potentially be changed between
minor versions of NiFi. It is, therefore, not recommended that implementations be developed outside of the NiFi codebase.
Provenance Repository
The Provenance Repository is responsible for storing, retrieving, and querying all Data Provenance Events. Each time that a FlowFile is
received, routed, cloned, forked, modified, sent, or dropped, a Provenance Event is generated that details this information. The event contains
information about what the Event Type was, which FlowFile(s) were involved, the FlowFile's attributes at the time of the event, details about the
event, and a pointer to the Content of the FlowFile before and after the event occurred (which allows a user to understand how that particular
event modified the FlowFile).
The Provenance Repository allows this information to be stored about each FlowFile as it traverses through the system and provides a mechanism
for assembling a "Lineage view" of a FlowFile, so that a graphical representation can be shown of exactly how the FlowFile was handled. In order
to determine which lineages to view, the repository exposes a mechanism whereby a user is able to search the events and associated FlowFile
attributes.
The default implementation is PersistentProvenanceRepository. This repository stores all data immediately to disk-backed write-ahead log and
periodically "rolls over" the data, indexing and compressing the data. The search capabilities are provided by an embedded Lucene engine.
Note: While the Provenance Repository is pluggable, it is considered a 'private API' and its interface could potentially be changed between
minor versions of NiFi. It is, therefore, not recommended that implementations be developed outside of the NiFi codebasemajor release of NiFi.
Process Scheduler
In order for a Processor or a Reporting Task to be invoked, it needs to be scheduled to do so. This responsibility belongs to the Process Scheduler. In addition to
...