Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

managed by the framework.

 

Content Repository

The Content Repository is responsible for storing the content of FlowFiles and providing mechanisms for reading the contents

of a FlowFile. This abstraction allows the contents of FlowFiles to be stored independently and efficiently based on the underlying

storage mechanism. The default implementation is the FileSystemRepository, which persists all data to the underlying file system.

 

Note: While the Content Repository is pluggable, it is considered a 'private API' and its interface could potentially be changed between

minor versions of NiFi. It is, therefore, not recommended that implementations be developed outside of the NiFi codebase.

 

FlowFile Repository

The FlowFile Repository is responsible for storing the FlowFiles' attributes and state, such as creation time and which FlowFile Queue

the FlowFile belongs in. The default implementation is the WriteAheadFlowFileRepository, which persists the information to a write-ahead

log that is periodically "checkpointed". This allows extremely high transaction rates, as the files that it writes to are "append-only," so the

OutputStreams are able to be kept open. Periodically, the repository will checkpoint, meaning that it will begin writing to new write-ahead logs,

write out the state of all FlowFiles at that point in time, and delete the old write-ahead logs. This prevents the write-ahead logs from growing

indefinitely.

 

Note: While the FlowFile Repository is pluggable, it is considered a 'private API' and its interface could potentially be changed between

minor versions of NiFi. It is, therefore, not recommended that implementations be developed outside of the NiFi codebase.

 

Provenance Repository

The Provenance Repository is responsible for storing, retrieving, and querying all Data Provenance Events. Each time that a FlowFile is

received, routed, cloned, forked, modified, sent, or dropped, a Provenance Event is generated that details this information. The event contains

information about what the Event Type was, which FlowFile(s) were involved, the FlowFile's attributes at the time of the event, details about the

event, and a pointer to the Content of the FlowFile before and after the event occurred (which allows a user to understand how that particular

event modified the FlowFile).

 

The Provenance Repository allows this information to be stored about each FlowFile as it traverses through the system and provides a mechanism

for assembling a "Lineage view" of a FlowFile, so that a graphical representation can be shown of exactly how the FlowFile was handled. In order

to determine which lineages to view, the repository exposes a mechanism whereby a user is able to search the events and associated FlowFile

attributes.

 

The default implementation is PersistentProvenanceRepository. This repository stores all data immediately to disk-backed write-ahead log and

periodically "rolls over" the data, indexing and compressing the data. The search capabilities are provided by an embedded Lucene engine.

 

Note: While the Provenance Repository is pluggable, it is considered a 'private API' and its interface could potentially be changed between

minor versions of NiFi. It is, therefore, not recommended that implementations be developed outside of the NiFi codebase.

 

Process Session

The Process Session (often referred to simply as a "session") provides Processors access to FlowFiles and provides transactional behavior across

the tasks that are performed by a Processor. The session provides get() methods for obtaining access to FlowFiles that are queued up for a Processor,

methods to read from and write to the contents of a FlowFile, add and remove FlowFiles from the flow, add and remove attributes from a FlowFile,

and route a FlowFile to a particular relationship. Additionally, the session provides access to the ProvenanceReporter that is used by Processors to

emit Provenance Events.

 

Once a Processor is finished performing its task, the Processor has the ability to either commit or rollback the session. If a Processor rolls back the session,

the FlowFiles that were accessed during that session will all be reverted to their previous states. Any FlowFile that was added to the flow will be destroyed.

Any FlowFile that was removed from the flow will be re-queued in the same queue that it was pulled from. Any FlowFile that was modified will have both its

contents and attributes reverted to their previous values, and the FlowFiles will all be re-queued into the FlowFile Queue that they were pulled from. Additionally,

any Provenance Events will be discarded.

 

If a Processor instead chooses to commit the session, the session is responsible for updating the FlowFile Repository and Provenance Repository with

the relevant information. The session will then add the FlowFiles to the Processor's outbound queues (cloning as necessary, if the FlowFile was transferred to

a relationship for which multiple connections have been established). 

 

Process Context

The Process Context provides a bridge between a Processor and its associated Processor Node. It provides information about about the Processor's current

configuration, as well as the ability to "yield," or signal to the framework that it is unable to perform any work for a short period of time so the framework should not

not waste resources scheduling the Processor to run. The Process Context also provides mechanisms for accessing the Controller Services that are available,

so that Processors are able to take advantage of shared logic or shared resources.

 

Reporting Task

A Reporting Task is a NiFi extension point that is capable of reporting and analyzing NiFi's internal metrics in order to provide the information to external

resources or report status information as bulletins that appear directly in the NiFi User Interface. Unlike a Processor, a Reporting Task does not have access

to individual FlowFiles. Rather, a Reporting Task has access to all Provenance Events, bulletins, and the metrics shown for components on the graph, such

as FlowFiles In, Bytes Read, and Bytes Written.

 

The Reporting Task is an extension point, and its API will not change from one minor release of NiFi to another but may change with a new

major release of NiFi.

 

Controller Service

The Controller Service is a mechanism that allows state or resources to be shared across multiple components in the flow. The SSLContextService, for instance,

allows a user to configure SSL information only once and then configure any number of resources to use that configuration. Other Controller Services are used to

share configuration. For example, if a very large dataset needs to be loaded, it will generally make sense to use a Controller Service to load the dataset. This allows

multiple Processors to make use of this dataset without having to load the dataset multiple times.

 

The Controller Service is an extension point, and its API will not change from one minor release of NiFi to another but may change with a new

Reporting Task

A Reporting Task is a NiFi extension point that is capable of reporting and analyzing NiFi's internal metrics in order to provide the information to external

resources or report status information as bulletins that appear directly in the NiFi User Interface. Unlike a Processor, a Reporting Task does not have access

to individual FlowFiles. Rather, a Reporting Task has access to all Provenance Events, bulletins, and the metrics shown for components on the graph, such

as FlowFiles In, Bytes Read, and Bytes Written.

 

The Reporting Task is an extension point, and its API will not change from one minor release of NiFi to another but may change with a new

major release of NiFi.

 

Controller Service

The Controller Service is a mechanism that allows state or resources to be shared across multiple components in the flow. The SSLContextService, for instance,

allows a user to configure SSL information only once and then configure any number of resources to use that configuration. Other Controller Services are used to

share configuration. For example, if a very large dataset needs to be loaded, it will generally make sense to use a Controller Service to load the dataset. This allows

multiple Processors to make use of this dataset without having to load the dataset multiple times.

 

The Controller Service is an extension point, and its API will not change from one minor release of NiFi to another but may change with a new

major release of NiFi.

 

Process Session

The Process Session (often referred to simply as a "session") provides Processors access to FlowFiles and provides transactional behavior across

the tasks that are performed by a Processor. The session provides get() methods for obtaining access to FlowFiles that are queued up for a Processor,

methods to read from and write to the contents of a FlowFile, add and remove FlowFiles from the flow, add and remove attributes from a FlowFile,

and route a FlowFile to a particular relationship. Additionally, the session provides access to the ProvenanceReporter that is used by Processors to

emit Provenance Events.

 

Once a Processor is finished performing its task, the Processor has the ability to either commit or rollback the session. If a Processor rolls back the session,

the FlowFiles that were accessed during that session will all be reverted to their previous states. Any FlowFile that was added to the flow will be destroyed.

Any FlowFile that was removed from the flow will be re-queued in the same queue that it was pulled from. Any FlowFile that was modified will have both its

contents and attributes reverted to their previous values, and the FlowFiles will all be re-queued into the FlowFile Queue that they were pulled from. Additionally,

any Provenance Events will be discarded.

 

If a Processor instead chooses to commit the session, the session is responsible for updating the FlowFile Repository and Provenance Repository with

the relevant information. The session will then add the FlowFiles to the Processor's outbound queues (cloning as necessary, if the FlowFile was transferred to

a relationship for which multiple connections have been established). 

 

Process Context

The Process Context provides a bridge between a Processor and its associated Processor Node. It provides information about about the Processor's current

configuration, as well as the ability to "yield," or signal to the framework that it is unable to perform any work for a short period of time so the framework should not

not waste resources scheduling the Processor to run. The Process Context also provides mechanisms for accessing the Controller Services that are available,

so that Processors are able to take advantage of shared logic or shared resources.

 

FlowFile Repository

The FlowFile Repository is responsible for storing the FlowFiles' attributes and state, such as creation time and which FlowFile Queue

the FlowFile belongs in. The default implementation is the WriteAheadFlowFileRepository, which persists the information to a write-ahead

log that is periodically "checkpointed". This allows extremely high transaction rates, as the files that it writes to are "append-only," so the

OutputStreams are able to be kept open. Periodically, the repository will checkpoint, meaning that it will begin writing to new write-ahead logs,

write out the state of all FlowFiles at that point in time, and delete the old write-ahead logs. This prevents the write-ahead logs from growing

indefinitely.

 

Note: While the FlowFile Repository is pluggable, it is considered a 'private API' and its interface could potentially be changed between

minor versions of NiFi. It is, therefore, not recommended that implementations be developed outside of the NiFi codebase.

 

Content Repository

The Content Repository is responsible for storing the content of FlowFiles and providing mechanisms for reading the contents

of a FlowFile. This abstraction allows the contents of FlowFiles to be stored independently and efficiently based on the underlying

storage mechanism. The default implementation is the FileSystemRepository, which persists all data to the underlying file system.

 

Note: While the Content Repository is pluggable, it is considered a 'private API' and its interface could potentially be changed between

minor versions of NiFi. It is, therefore, not recommended that implementations be developed outside of the NiFi codebase.

 

Provenance Repository

The Provenance Repository is responsible for storing, retrieving, and querying all Data Provenance Events. Each time that a FlowFile is

received, routed, cloned, forked, modified, sent, or dropped, a Provenance Event is generated that details this information. The event contains

information about what the Event Type was, which FlowFile(s) were involved, the FlowFile's attributes at the time of the event, details about the

event, and a pointer to the Content of the FlowFile before and after the event occurred (which allows a user to understand how that particular

event modified the FlowFile).

 

The Provenance Repository allows this information to be stored about each FlowFile as it traverses through the system and provides a mechanism

for assembling a "Lineage view" of a FlowFile, so that a graphical representation can be shown of exactly how the FlowFile was handled. In order

to determine which lineages to view, the repository exposes a mechanism whereby a user is able to search the events and associated FlowFile

attributes.

 

The default implementation is PersistentProvenanceRepository. This repository stores all data immediately to disk-backed write-ahead log and

periodically "rolls over" the data, indexing and compressing the data. The search capabilities are provided by an embedded Lucene engine.

 

Note: While the Provenance Repository is pluggable, it is considered a 'private API' and its interface could potentially be changed between

minor versions of NiFi. It is, therefore, not recommended that implementations be developed outside of the NiFi codebasemajor release of NiFi.

 

Process Scheduler

In order for a Processor or a Reporting Task to be invoked, it needs to be scheduled to do so. This responsibility belongs to the Process Scheduler. In addition to

...