Page History

Page properties

Target release

Epic

Document status

Status

title	DRAFT

Document owner

Mark Payne

Designer

Developers

QA

Goals

Provide the ability to replicate FlowFile content and attributes in such a way that if a node in a NiFi cluster is lost, another node or nodes can quickly and automatically resume the processing of the data

Background and strategic fit

In today's world, it is becoming increasingly important for many organizations that data processing is both timely and highly available, even in the face of failures. The current design and implementation of the Content and FlowFile Repositories is such that if a NiFi node is lost, the data will not be processed until that node is brought back online. While this is acceptable for many use cases, there are many other use cases in which this is not acceptable.

While using a RAID configuration can mitigate the fears of data loss by providing high-performance local replication, we have heard from several organizations that in order to make use of NiFi, they need the system to automatically failover to a new node if one node is lost.

Assumptions

Requirements

#	Title	User Story	Importance
1	Replicate FlowFile Content	If a particular NiFi node is lost (due to machine failure, etc.) the data needs to be made available in such a way that other nodes in the cluster can retrieve the data and process it themselves	Must Have
2	Replicate FlowFile Attributes	If a particular NiFi node is lost (due to machine failure, etc.) the FlowFile information (attributes, current queue identifier, metadata, etc.) needs to be made available in such a way that other nodes in the cluster can retrieve the information and process the FlowFiles themselves	Must Have
3	Ability to Failover	At any time, all nodes in a NiFi cluster must be able to begin processing data that previously was "owned" by another node. This must be orchestrated in such a way that the data is processed only once and if the original "owner" returns to the cluster that it does not re-process the data itself	Must Have
4	Automatic Failover	In addition to the ability of a node to begin processing data from another node, this failover must happen quickly and in an automated fashion. This should not require human intervention.	Must Have
5	Replicate "Swap Files"	If one of the NiFi queues reaches a certain (configurable) capacity, NiFi will "swap out" the FlowFiles, serializing the FlowFile info to disk, and removing them from the Java heap in order to ensure that the JVM does not run out of memory. This must also be replicated in order to ensure that another node that resumes the work of a failed node is also able to recover those FlowFiles without running out of heap space.	Must Have

User interaction and design

Replicating FlowFile Content

...

There also exists another Feature Proposal for State Management. If we implement this first, it will largely aid in the implementation of communicating with ZooKeeper.

Questions

Below is a list of questions to be addressed as a result of this requirements document:

Question	Outcome

Space shortcuts

Child pages

Versions Compared

Old Version 1

New Version Current

Key