Goals
- Provide a feature rich environment for aiding in the design, deployment and management of flows in MiNiFi instances
Background and strategic fit
Being responsive to changing and evolving needs of data collection and aggregation requires changes to be adaptable to changing needs an organization has for the information being collected.
This is two faceted in terms of needed functionality. First, a user experience and interface for the designing and versioning of flows. The second, a means of making flows available for instances to receive causing updated processing to occur.
User Experience and Flow Design
This could be an extension of the core NiFi interface but a separate workspace and feel. At its core, a minifi-api could be introduce which functions similarly to the nifi-api, a REST API that drives the user interface and core design functionality. The reason for a separate module is to allow arbitrary enabling/disabling of the MiNiFi functionality in a NiFi instance. While a similar user experience to NiFi in terms of design is extremely valuable, the context and palette available is very much discrete. To that end, the workspace approach would allow a separate context for users to carry out the task of managing their MiNiFi flows with unique tooling to that workflow.
Users could create flows on a per MiNiFi class basis. A class is defined as a group of MiNiFi instances that share a common flow. Using an approach similar to that content outlined with the Configuration Management of Flows.
Users would also be able to select the current, or active, flow for a given class of instances and make this available for deployment. At minimum, metadata would include a hash or signature of the flow as well as an identifier
Command & Control - Flow Deployment/Updating
The other scenario to be supported for MiNiFi is more application focused and provides the needed infrastructural components. At its core, this introduces a Command and Control API (C&C API) which is inherently a defined set of REST endpoints and resources that could be implemented in any language of choice. An initial implementation could be created in Java in a manner analogous to that of the aforementioned nifi-api and minifi-api modules.
An important note is the positioning and nature in which the C&C API could be deployed and utilized. As systems extend farther from core infrastructure and networking, the means by which communication occurs increases in complexity inclusive of items such as availability, bandwidth, NAT traversal and organizational and security policies. As a result, there may be varying tiers of access and the need for a common API to be available and consumable in a distributed and possibly localized manner. In NiFi environments, the idea of the Flow Persistence Provider could provide a façade to a more canonical repository of flows or cache and provide a subset of those flows locally.
Specific implementations of the C&C API could provide sophisticated provisioning of flows to subgroups of classes akin to split testing based upon individual MiNiFi instance metadata.
Command & Control – Flow Consumption & Data Tagging
Flows could be consumed through various means driven by the Configuration Change Notifier/Listener approach currently provided in an initial implementation and design in the MiNiFi codebase. This allows MiNiFi to be amenable to the mechanism in which flows could be transferred to a given set of instances. The desired mechanism would be to make use of the C&C API directly, but in some cases may require a file to be delivered to a specific directory. While there may be advantageous paths as default means of transport, the C&C API in conjunction with extensible Configure Change Notifiers allows instances to be adaptable to realities of an organization’s network and compute infrastructure.
Making use of immutably versioned flows provided to instances would allow the tagging of FlowFile data and/or provenance events generated, tied to a specific flow version. This empowers the destination systems of MiNiFi data to make determinations on the inherent worth of the data received. For those instances where data is collected/generated by a system that has an outdated flow, it may be of little or no value or require additional/separate processing.
Assumptions
- Coincides heavily with the Configuration Management of Flows
Requirements
# | Title | User Story | Importance | Notes |
---|---|---|---|---|
1 | ||||
2 |
User interaction and design
Questions
Below is a list of questions to be addressed as a result of this requirements document:
Question | Outcome |
---|---|
Not Doing
Flow Authorship Details
10 Comments
Bryan Rosander
I think this writeup is a great start! I do have a few questions so far though.
1. How different will nifi-api and minifi-api be? I can see them having different processors, slightly different capabilities, and deployment models but does this warrant a completely separate api layer? Would it be possible instead to keep the nifi-api but have it use appropriate design-time features based on context?
2. Will the first iteration of the C&C api have high availability as a goal? If we have peer listing as a defined endpoint, each MiNiFi instance could curate a list of reachable peers in case of failure or a change in network topology.
3. Do we intend for the Configuration Change Notifier/Listener interfaces to be extensible by dropping in a nar? This would be more flexible for users of MiNiFi. The C&C implementation could still poll REST endpoints but a hierarchical or peer to peer push model could be desirable if near real-time updates are desired and the network topology supports it.
Aldrin Piri
Thanks!
Joseph Percivall
In regards to number 3, on the surface I think it's an interesting idea but I don't think extensibility of notifiers is worth the down-sides of the classloader isolation (ie. increased footprint). Also I don't foresee the ability to drop in new notifiers in arbitrary MiNiFi versions particularly useful (kinda like queue prioritizers in NiFi). Lastly, if we kept it how it is it would also allow us to remain flexible in regards to refactoring the notifier API between minor versions, which could be very helpful given how early we still are in terms of design.
Andre
Great start Aldrin!
one small feedback:
It would be great to distinguish command and control from a management point of view. I can envision a lot of shops using something like ansible, puppet, chef, whatever to deploy MiNiFi. In such cases, users should be able to still edit the flow but not to push it, see the status but not to manipulate it and so it goes.
Such strategy woul ease managing staged rollouts in alignment with host changes (instead of solely based on flow or nar changes)
Aldrin Piri
Hey Andre,
Thanks for the feedback, and agree entirely on this front. Would also invite you to scope out Configuration Management of Flows#FlowVersioning if you have not had the opportunity to do so yet. That notion is definitely included where a flow could be designed/saved but not necessarily deployed. The point and case you call out though is certainly one to keep in mind and track.
Thanks!
Joseph Percivall
Just echoing Andre and Bryan's praise, thanks for getting this started Aldrin!
One thing that isn't mentioned but would go a long way in terms of updating agents on the edge, MiNiFi version and Nar updates. After deploying 1000s+ of agents on the edge you don't want to have to manually update each one when a new version comes out. Being able to automatically update the MiNiFi version as well as deploying new Nars would be a game-changer.
From the MiNiFi agent perspective, NAR updates probably wouldn't be that hard since they are designed to be isolated in a logical package. All it would require is receiving/pulling them (like with a new flow), adding it to the lib dir, wiping the work dir and restarting the underlying instance. Then on the centralized side, it would probably be a part of the extension registry?
Updating the MiNiFi version would be much harder though as it would require updating the bootstrap module itself (which currently handles flow changes). On the centralized side, it probably makes sense to have it in the extension registry as well.
We should focus first on Flow C&C but nar and MiNiFi version updating are good longer term goals to keep in mind.
Andre
All,
I have played with MiNiFi on a slightly more production like environment and here's my feedback regarding C&C:
I believe we should consider the following use cases:
I like the idea of giving great freedom to the DFM to define how to update the agents, preferably using the NiFi canvas itself. The "C&C canvas" could be a special instance of a process group, requiring just a way to clients to communicate. To this point, perhaps we could extend the Site to Site protocol or simply extend the HandleHttpRequest / HandleHttpResponse to provide a series of pre-canned endpoints such as:
- GET /minificc/TargetFlowVersion (to ship a template for example)
- GET /minificc/TargetMiNiFiVersion (to ship new NiFi version)
- GET /minificc/TargetNarVersion (to ship a particular NAR bundle)
- POST /minificc/AgentStatus (to ping with status)
- anything else (to allow MiNiFi to be extended).
And literally let the DFM use NiFi dataflows to manage the agent fleet?To be honest, I suspect I oversimplified things (devil always lies on the details) but this idea doesn't seem too far away from what I understand AWS has implemented as part of AWS IoT (MTQQ backed by rules engine, Lambda and DynamoDB). I would suggest the difference seems that we provide users with an API but also an flexible agent.
Thoughts?
Marc Parisi
"At its core, this introduces a Command and Control API (C&C API) which is inherently a defined set of REST endpoints and resources that could be implemented in any language of choice. An initial implementation could be created in Java in a manner analogous to that of the aforementioned nifi-api and minifi-api modules. "
Implies to me a rather heavy weight protocol, but much less so than TLS.
Would it be useful to define classes of MiNiFi agents?
I imagine a situation where you have some MiNiFi agents that could support TLS, while there are others whose throughput may be impacted by the addition of TLS. Has there been discussion of how we view different classes of MiNiFi agents and when/if security can be viewed differently?
Further, C2C typically has some investment in routing protocols. I don't see much in terms of a direction for how routing will occur across the API or whether or not this is something to be left to an external agent.
Marc Parisi
With a centralized C2 would you support decentralizing distribution of C2 commands such that you limit entry points into a network with a single node(s) distributing C2 commands on behest of the C2 central server?
Marc Parisi
Can you define "In NiFi environments, the idea of the Flow Persistence Provider could provide a façade to a more canonical repository of flows or cache and provide a subset of those flows locally." in regards to MiNiFi. It makes complete sense in relation to NiFi and installations thereof, but it begs the question of survivability of said data when evaluating the variability of agents.