Goals
Provide the ability for effective and automated configuration management of dataflows including merge, diffs, rollback across both the actual executing dataflow and all templates.
Background and strategic fit
Interactive command and control is a powerful feature and indeed helps organizations more rapidly establish and improve dataflows between systems. This degree of freedom though in some environments creates new risks whereby an operator could mistakenly alter the behavior of the flow and be unable to quickly return it to its previous ‘good state’. This is not a new problem area. Configuration management of these flows can be thought of as being very similar to configuration management of source code. NiFi should automatically be capturing and storing changesets and state as entities manipulate the dataflow. Operators should be able to effect a rollback from the current state to a previous state by selecting the previous state and initiating a rollback (often referred to as 'undo'). The operator should also be able to see a visual indicator of the difference between the current state and the previous state. In this sense changes could be seen as occurring along a linear chain/stack. There may be merit in an even more complex branch/merge construct but the simpler case may be sufficient.
This same configuration management capability is also very useful for the management of flow templates as well. Users should be able to establish versions for templates and be able to visualize the template and diffs between the various versions.
The version of a given dataflow or template also needs to take into account the availability and version of components within the flow. That is to say we must consider what to do when a processor used within a version of the flow or a template is not present when applying that version again. Should we fail on startup? Fail to apply the template? Should we set a ‘placeholder processor’ and prompt the user that they must select the new processor they’d like to use in its place and prompt them?
Flow Versioning
There have been expressed needs for increased tooling to aid in the ideas of software development lifecycle (SDLC) as well as flow versioning and migration between varying environments. To help support this, it would be valuable to have functionality introduced that could serve as a store of flows that integrates with other proposed features such as the Variable and Extension/Template Registries.
At its core, Flow Versioning would be provided by one or more Flow Persistence Provider (FPP) implementations. The FPP could act as an extension point to the NiFi or MiNiFi framework providing implementations to varying stores that support the semantics of immutability and associating metadata for a version to provide confidence in integrity of the stored flow; verifiable through a mechanism such as a signature or hashing. This elevates the Flow Persistence Store (FPS) to a source of truth for flows utilized as well as a reference for data as to what flow brought it to its current state based on the flows at that time.
An FPS has properties similar to many technologies currently in use with similar workflows and procedures, inclusive of systems like git and Maven. Alternatively, using a database as a storage mechanism could also provide similar functionality. In an initial implementation, it would likely prove advantageous to make use of an external process or system in lieu of requiring functionality built into the NiFi codebase directly.
Integration with Other Components
With the introduction of components inclusive of items such as Variable Registry, Extension/Template Registry, flexibility could be provided in creating a merging of different components, properties, and templates between environments and flows. Interaction with each of the mentioned components could provide reduction in the design, management, and deployment of flows. Additionally, a design should be able to help support the tenets of MiNiFi Command and Control through common framework functionality and a shared data model for flows
Sequence
An approximate sequence of capturing flow changes maps to how a user might interact with git.
Migration of Flows
Using the mentioned integration between components, it could be possible to use shared persistence stores to migrate between varying production environments. One such case is illustrated in the accompanying image. Consider where policy prohibits direct interaction of the development environment with the production environment. Shared stores would allow the promotion of a flow initially conceived and developed to be pushed to a shared store for testing against integration level of systems for a final QA check before promotion to a production environment utilizing another store.
FlowFile Data Tagging
Making use of immutably versioned flows provided to instances would allow the tagging of FlowFile data and/or provenance events generated, tied to a specific flow version. This allows a mapping of data the precise flow that caused its form to be generated.
Assumptions
Requirements
# | Title | User Story | Importance | Notes |
---|---|---|---|---|
1 | Provide a shared SDLC approach and management to both NiFi and MiNiFi environments | Users should be able to have common infrastructure to support the management of both NiFi and MiNiFi dataflows | ||
2 |
User interaction and design
Questions
Below is a list of questions to be addressed as a result of this requirements document:
Question | Outcome |
---|---|