High Availability Processing

Target release
Epic
Document status	DRAFT
Document owner	Mark Payne
Designer
Developers
QA

Goals

Provide the ability to replicate FlowFile content and attributes in such a way that if a node in a NiFi cluster is lost, another node or nodes can quickly and automatically resume the processing of the data

Background and strategic fit

In today's world, it is becoming increasingly important for many organizations that data processing is both timely and highly available, even in the face of failures. The current design and implementation of the Content and FlowFile Repositories is such that if a NiFi node is lost, the data will not be processed until that node is brought back online. While this is acceptable for many use cases, there are many other use cases in which this is not acceptable.

While using a RAID configuration can mitigate the fears of data loss by providing high-performance local replication, we have heard from several organizations that in order to make use of NiFi, they need the system to automatically failover to a new node if one node is lost.

Assumptions

Requirements

#	Title	User Story	Importance
1	Replicate FlowFile Content	If a particular NiFi node is lost (due to machine failure, etc.) the data needs to be made available in such a way that other nodes in the cluster can retrieve the data and process it themselves	Must Have
2	Replicate FlowFile Attributes	If a particular NiFi node is lost (due to machine failure, etc.) the FlowFile information (attributes, current queue identifier, metadata, etc.) needs to be made available in such a way that other nodes in the cluster can retrieve the information and process the FlowFiles themselves	Must Have
3	Ability to Failover	At any time, all nodes in a NiFi cluster must be able to begin processing data that previously was "owned" by another node. This must be orchestrated in such a way that the data is processed only once and if the original "owner" returns to the cluster that it does not re-process the data itself	Must Have
4	Automatic Failover	In addition to the ability of a node to begin processing data from another node, this failover must happen quickly and in an automated fashion. This should not require human intervention.	Must Have
5	Replicate "Swap Files"	If one of the NiFi queues reaches a certain (configurable) capacity, NiFi will "swap out" the FlowFiles, serializing the FlowFile info to disk, and removing them from the Java heap in order to ensure that the JVM does not run out of memory. This must also be replicated in order to ensure that another node that resumes the work of a failed node is also able to recover those FlowFiles without running out of heap space.	Must Have

User interaction and design

Replicating FlowFile Content

The initial design for this idea is to build an HDFSContentRepository and a ReplicatingContentRepository. The ReplicatingContentRepository is responsible for wrapping multiple Content Repositories in such a way that any event (such as data being written) to one of the Repositories happens to all of the Repositories. This allows us to have multiple Content Repositories that are identical to one another. This can be used in conjunction with the new HDFSContentRepository and the existing FileSystemRepository, so that any write to the local Content Repository is replicated to HDFS as well. This approach allows us to read data, using the local file system like we do now in order to obtain the best performance (especially on random access) but also makes the content available in HDFS in such a way that other nodes can access it.

One of the benefits to this design is that we can also implement other Content Repositories at a later date without having to break any sort of backward compatibility. For example, we could build a Content Repository for storing data in Amazon S3 or Azure's storage mechanisms.

This also provides us with a tremendous "content archive" by allowing us to store massive amounts of data in HDFS or whatever mechanism we are using to back our repository.

Replicating FlowFile Attributes

With respect to the FlowFile Repository, it should not be necessary to replicate the repository to both local file system and HDFS. This repository is written to continually but is written to only as a persistence mechanism for restarts. The data is never read from the repository except for on restart of NiFi. Because other nodes may update the repository while the "original owner" is disconnected, when the node is restarted, it will have to read the contents of the repository from HDFS in order to verify that it has the most up-to-date copy. As a result, there is no apparent benefit to having a local copy, so we will not need a ReplicatingFlowFileRepository at this time but rather will need just an HDFSFlowFileRepository.

Ability to Failover

When it is decided that a new node should take over the work (or some of the work) for another node, it is important that the new node is able to update the remote Content and FlowFile Repositories. This means that any update that is made to a FlowFile is updated to the appropriate repository and not just the node's usual repository.

Automatic Failover

Currently, the NiFi Cluster Manager does not provide High Availability. If the Cluster Manager is lost, NiFi will continue to process data as it should but the Command-and-Control functions are not available. While this is a feature that is important to provide (there is a Feature Proposal available for High Availability of Cluster Manager as well), we will not assume that such a feature has been implemented for the design of this Feature Proposal. Unfortunately, since the Cluster Manager does not provide High Availability, it cannot be relied upon to make decisions about which nodes should resume the work of another node.

Instead, we will rely on ZooKeeper to elect a "Leader Node." When Node A goes down, the Leader Node is responsible for assigning the work to another node (it could be itself or any other node). We will send a "heartbeat" to ZooKeeper like we do to the NCM currently so that we know which nodes are alive and how busy each node is. This way, we can have the Leader assign the work to the most capable node. Ideally, we will have some mechanism for divvying up the work among the nodes.

There also exists another Feature Proposal for State Management. If we implement this first, it will largely aid in the implementation of communicating with ZooKeeper.

Questions

Below is a list of questions to be addressed as a result of this requirements document:

Question	Outcome

Space shortcuts

Child pages