Better Support for Parameterizing Flows

Target release
Epic
Document status	DRAFT
Document owner	Mark Payne
Designer

Goals

Provide the ability for users to parameterize the values of any Processor Property in the flow, including sensitive properties
Support parameterization of properties without the developer needing to enable it
Improve the User Experience of importing a flow so that all parameters can be set in one place

Background and strategic fit

In version 1.4.0 of Apache NiFi, we introduced the notion of a Variable Registry at the Process Group level. This has been extremely helpful for creating and executing flows. Later, in version 1.5.0, NiFi introduced the concept of the Flow Registry. As a result, users are now able to build a flow in one environment and then import that flow into one or more environments (and import the same flow many times into a single environment). There are, however, some shortcomings in the current process:

Sensitive properties cannot be parameterized. Generally, sensitive properties do not make use of Expression Language. Even if they do, the value is never sent to the Flow Registry, so upon import, the user is still required to find the necessary sensitive properties and update them. This is due to the fact that we don't want to expose sensitive properties in the Flow Registry or introduce the added complexity of sharing keys.
Not all non-sensitive properties can be parameterized. Properties can only be parameterized if they allow use of the Expression Language. This means that the developer of the Processor must explicitly indicate that the property needs to be evaluated against the Expression Language. Because Expression Language can make use of FlowFile attributes, it is often the case that properties do not allow use of the Expression Language because doing so is not feasible or is far more difficult to implement. This is often the case for properties such as a hostname or a Connection URL, in which supporting Expression Language would mean supporting arbitrarily many "client" objects that may never get reused and therefore must be properly managed/aged off, etc.
Variables do not have access policies. The access policies for viewing and changing variables are derived from the access policies of the Process Group where the variables are stored. However, a variable can be referenced by any user, because they are referenced by Expression Language in free-form text. It's not really feasible to lock this down because if we decide that we don't want to allow access to variable "abc" for User X, then we can block User X from setting a property value to `${abc}` but we cannot block the user from setting a property value like `${ ${literal('a')}${literal('b')}${literal('c')} }` and all other possible combination that could be use to construct the string `abc`.
Variables are not easy to reuse in multiple places throughout the flow. If a user wants to import 10 copies of the same flow, the variables get defined at the Process Group level, which means that the variables have to be defined in every Process Group that gets imported (so a change means updating 10 variables) or the variable has to be defined at a higher level, meaning that we have to deal with naming conflicts and we have to structure our flow in such a way that the hierarchy can exist.

In order to address these shortcomings, I am proposing that we introduce a new notion of "Parameters" as an alternative and phase out use of Variables.

Terminology

A Parameter, as discussed here, is made up of four parts:

Name - A name that is used to denote the Parameter
Description - A description that explains what the Parameter is, how it is to be used, etc.
Sensitivity Flag - Whether or not the Parameter's Value should be considered sensitive. If so, the value of the Parameter will not be shown in the UI once set.
Value - the value that will be used when this Parameter is referenced.

A Parameter Context is a named set of Parameters. The Parameter Context is the unit at which Policies are applied. There will exist a Policy that allows users to create a Parameter Context. Once created, policies to Read and Write a specific Parameter Context may be applied. Parameter Contexts live at the "controller level," meaning that that do not belong to any Process Group but rather are globally defined / accessible for the NiFi instance.

Concepts / Rules

A user with appropriate permissions can create a Parameter Context and then add Parameters to it. That Parameter Context can have its Read/Write Policies updated so that only appropriate users can read/write to the context.

Any property can be configured to reference a Parameter instead of explicitly defining a value, with the following caveats:
- A sensitive property may only reference a Sensitive Parameter. This rule exists because when a sensitive property references a Parameter, we want to store that reference in the Flow Registry. If we allowed referencing a non-sensitive Parameter, this means that we would expose both the reference and the value, resulting in the exposure of the sensitive property's value.
- A non-sensitive property may only reference a non-Sensitive Parameter. When a developer marks a property as sensitive, it means that they are taking responsibility to treat the value of the property as such, and should not be logging the value, etc. If the property is not marked as sensitive, the developer is making no such assertion about how the value will be treated and, as such, the value of a Sensitive Parameter should not be exposed to the property.
- Properties that reference Controller Services will not be able to use Parameters.

Parameters can be named using the following character set: A-Za-z0-9-_. and <space>.

In order to reference a Parameter, a new syntax will be introduced, using the `#` symbol as the start, with the Parameter's name enclosed in curly braces: `#{Parameter.Name}` This can be escaped using an additional `#` character at the beginning. To illustrate this, assume that the Parameter "abc" has a value of "xxx" and Parameter "def" has a value of "yyy". Then, the following user-defined property values will evaluate to these effective values:

User-Entered Literal Property Value	Effective Property Value	Explanation
#{abc}	xxx	Simple substitution
#{abc}/data	xxx/data	Simple substitution with additional literal data
#{abc}/#{def}	xxx/yyy	Multiple substitution with additional literal data
#{abc	#{abc	No { } for parameter replacement
#abc	#abc	No { } for parameter replacement
##{abc}	#{abc}	Escaped # for literal interpretation
###{abc}	#xxx	Escaped # for literal interpretation, followed by simple substitution
####{abc}	##{abc}	Escaped # for literal interpretation, twice
#####{abc}	##xxx	Escaped # for literal interpretation, twice, followed by simple substitution
#{abc/data}	Exception thrown on property set operation	`/` not valid parameter name character

In order to reference a Parameter, the Process Group must first be assigned a Parameter Context. Processors and Controller Services within that Process Group may only reference Parameters within that Parameter Context. If the Parameter Context for a Process Group is changed, all components that reference any Parameter will be stopped, validated, and restarted (assuming still valid). If the Parameter Context is unset, it does *NOT* inherit context from parent. Instead, it means that no Parameters can be referenced. Any component that does already reference a Parameter will become invalid. A user can only set the Parameter Context of a Process Group to one of the Parameter Contexts that the user has the READ policy for. Additionally, in order to set the Parameter Context, the user must have the WRITE policy for the Process Group.

When referencing a Parameter from within Expression Language, the Parameter reference is evaluated first. So, if the user were to replace "xxx" with "zzz" for the `abc` Parameter, it would be done as: `${ literal('#{abc}'):replace('xxx', 'zzz') }`. This would result in exactly the same result as typing `${ literal('xxx'):replace('xxx', 'zzz') }`. We may be able to update Expression Language easily enough to avoid needing the `literal` call. However, the easier route, which is less error-prone and likely to allow for a quicker release would be to use the `literal` call, at least for the initial release.

Because it is such a common occurrence for a user to want to parameterize different property values, the User Experience must make it easy for the Flow Designer to quickly and easily add a new Parameter. So, if a Parameter Context has already been set for a Process Group, and the user has the WRITE policy for it, the UI should provide the user the ability, when editing a Property Value, to click some sort of "Create Parameter..." button and create a new Parameter in-line. This should set the value of the property to `#{<Parameter Name>}` and add the Parameter to the Parameter Context. Additionally, the UI should provide auto-complete capabilities so that the user can type something like `#{a` and then have auto-complete provide the ability to select the `abc` parameter. The auto-complete should show a tool-tip that indicates the Property's Description, Sensitivity Flag, and Value (if not sensitive).

If a user attempts to set a property value that references a Parameter, and the user does not have the READ policy of the selected Parameter Context, or if no Parameter Context has been selected, the Web Request to set the properties should fail, with an intuitive error message indicating why it failed. This must be done when the property is set because Processors run without any sort of "user context" so we cannot go the route of making the Processor invalid, for instance. It is, however, acceptable to reference a Parameter that does not yet exist in the Parameter Context. Doing so would render the component _invalid_ but should not fail on Set, as Parameters' permissions are set at the Parameter Context level. I.e., the existence of the specified parameter is not checked at set time, because flow authors can specify parameter references that are not currently present with the expectation that they will be available in the future / on a deployed flow.

Sensitive Properties may reference Sensitive Parameters. However, if doing so, the value of the property must be set precisely to a single Parameter reference. For example, a value of `#{password}123` will not be allowed. Additionally, a value of `#{password}#{suffix}` will not be allowed. This will be enforced because we want to be able to indicate in the Flow Registry that the property references a Parameter by including the value of a Sensitive Property if and only if the value is a reference to a sensitive parameter. Allowing a value of `#{password}123` makes it difficult to understand what will be sent to the Flow Registry. Should we send nothing? What if the user inadvertently has white space? That would lead to significant confusion. We cannot send `#{password}123` because that would lead to exposing part of the Sensitive Property's value. Additionally, this type of pattern would encourage the templating of passwords, etc.

From the Hamburger menu, a user should be able to go to a Parameter Context Manager screen where they are able to add a new context, and change the parameters and policies of a given context (assuming appropriate permissions). Additionally, if a given Parameter is used in any Property that supports Allowable Values, the editor should show only the values that are allowed. If used from multiple properties, each with different Allowable Values, the UI should indicate that the value cannot be changed because there are no acceptable values for this Parameter due to this fact. When modifying the values of Parameters, the user should be able to update the values of multiple Parameters and then be able to click "Apply" to apply changes to a Parameter Context atomically. I.e., do not stop processors, apply change, validate, restart for each Parameter change but rather once for the change of the Parameter Context. When the user changes the value of a Parameter, the UI should kick off a background process to validate all components that reference that Parameter. If any component changes from being valid to invalid, that should result in a warning in the UI. The UI should not allow the user to apply changes to a Parameter Context until they have received confirmation that it does not invalidate any components or received warning that the change set will in fact invalidate components (and the corresponding validation errors).

When creating a new Parameter Context, the user should be given ability to create a new, empty Context or copy an existing Context that they have READ permissions to. This makes it far easier to design a flow, then deploy several instances of the flow, in which each instance has mostly the same parameters but slightly different.

Assumptions

Requirements

#	Title	User Story	Importance	Notes
1
2

User interaction and design

The following explains the User Experience for each of the typical use cases.

Developing a Flow

The following story depicts the steps required to build a flow that is responsible for pulling data from a database and pushing the data into a Kafka topic and then starting Version Control on the flow so that it can be imported into another environment. (Note that this flow is not intended to perform Change Data Capture or capture increment changes but rather to migrate all data in a particular database table and copy the data to Kafka once.)

Open Parameter Context Manger from the hamburger menu.
Create Parameter Context named "My Context"
Add Parameters to Context
  - db.url
  - db.table
  - db.username
  - db.password
  - kafka.brokers
  - kafka.topic
Create Process Group
Configure Process Group
- Set Parameter Context to "My Context"
Add ExecuteSQL Processor to Process Group
Configure ExecuteSQL
  - Set Property 'Database Url' to `#{db.url}`
  - Set Property 'Database Password' to `#{db.password}`
  - etc.
Create PublishKafkaRecord Processor
Configure PublishKafkaRecord
- Set Property 'Kafka Brokers' to `#{kafka.brokers}`
- Set Property 'Kafka Topic' to `#{kafka.topic}`
Start Version Control on Process Group
  - Name Flow in Registry "Replicate DB to Kafka"
  - Choose which Parameter Context to Export
      - Choose existing Parameter Context (may or may not be the one selected for the Process Group to use during runtime).
      - Create new (provides user ability to populate values for all Parameters in the context and send those values to the Flow Registry)
- Create new from existing (provides user ability to choose an existing Parameter Context, make a copy of it, and then provide different values to use for populating Flow Registry)
      - Set empty, which exports parameter metadata needed for "guided tour" experience for consumers, but not parameter values
  - Exports flow including parameter context.
  - Everything in parameter context gets written to Registry except the values of sensitive parameters. (If the Parameter Context includes Parameters that the flow does not use, those are not included.)

Import Flow

The following story depicts the steps required to import the flow that was developed above into a new environment and make use of it.

Drag Process Group onto Canvas
Choose 'Import...' and select "Replicate DB to Kafka" flow
User will be prompted to select a local Parameter Context to use for the Process Group:
  - Select an existing local context
  - Create new context, with values initially populated from the values saved in Flow Registry
  - Create new context, by copying values from an existing local context
If there are nested Process Groups in the flow, the UX will need to provide a way to specify which Parameter Context should be used for each of those, since Parameter Contexts are not inherited, and because the intent of the original flow was to use a separate context as well. This could be done either by showing a hierarchy of the Process Groups and choosing a Parameter Context for each, or by showing a listing of Parameter Context Names that are referenced in the flow pulled from Registry and allowing the user to map each of those names to a locally defined Parameter Context.

Note that this user experience for importing the flow addresses both of the key use cases: Migration of flow from dev to test to prod, as well as the need to create parameterized flows (i.e., import the same flow multiple times into the same environment, each with different parameters). Each time the user imports the flow, they can choose a new set of parameters to use.

Change Flow Version

After importing the flow as in the story above, many times, a user will change the flow and add a new version to the Registry. If doing so results in a new Parameter being added, that must be addressed when an instance of the flow is updated to the new version. The following story depicts the steps required to change the version of the flow when new Parameters have been added to the Parameter Context, or Parameters that were previously not referenced become referenced.

Right-click Process Group on Canvas. Choose 'Version → Change Version'
Select the Registry, Bucket, and Flow from the list.
Choose version 2 of the flow and click Change.
User will be presented with a dialog that shows the Parameters that were added (along with their values). The user will be given the option of how to account for the new Parameters:
1. Add new Parameters to existing/selected Parameter Context (only if user has the WRITE policy for the Parameter Context). User is prompted to specify values for the new Parameters.
2. Create a new Parameter Context whose values are copied from the currently selected Parameter Context (only if user has the WRITE policy for Parameter Contexts / ability to create Parameter Contexts)
3. Create a new Parameter Context whose values are copied from the values in the Flow Registry (only if user has the WRITE policy for Parameter Contexts / ability to create Parameter Contexts)
4. Cancel 'Change Version' action (this requires no special policies / permissions for the user)
User clicks button to complete 'Change Version' action

Future Enhancements

The following are improvements to the feature being proposed here, that are not necessarily required for the initial release / MVP of the feature.

Shareable Parameter Contexts (centralized, versioned)

Users will be able to store parameter contexts in a central service
- perhaps it is an interface and provider model, where one of the providers could be a product that is designed for storing sensitive information
- perhaps NiFi Registry gets a "Parameter Context" bucket item type, and the NiFi Registry can have multiple provider impls
Users will be able to version control Parameter Contexts
Users will be able to reference local or remote Parameter Contexts by coordinates, so anywhere we allow setting/selecting a context in the MVP, users will also be able to set a URL:bucket:context:version

Parameter values can reference values from other contexts, allowing a user to create a context that references values from other contexts using context:parameter coordinates
Parameter contexts can inherit from other contexts (i.e., Context A has 10 Parameters defined and I can create Context B that inherits from it and overrides one Parameter)
We introduce the concept of _Composite Parameter Contexts_ in which you define it as a list of other contexts, with an ordering so that Parameter value overrides get resolved deterministically.

Questions

Below is a list of questions to be addressed as a result of this requirements document:

Question	Outcome
Can we please reconsider the use of #{bla} to reference parameter bla in some property? I'd rather we allowed the user to select some flag to indicate the value they're entering is a parameter name. Or a drop-down is then presented... It feels like introducing a syntax mechanism to access such a thing which can be used inside and outside EL will create confusion. I think any field needing to use #{bla} such as '/some/path/${bla}' would be EL enabled too anyway.	While many properties will support EL, one of the primary motivations for parameters is to use them anywhere regardless of EL support. If we use a flag/drop-down to indicate a parameter, then non-EL properties are limited to a single parameter value. By allowing the new syntax it lets any property value combine multiple params and/or static values, such as `#{prefix}/data`, or `#{param1}-#{param2}`, which wouldn't be possible if selecting a single parameter. This also creates a consistent UX for EL and non-EL properties, meaning the syntax for using params is always the same.
I don't see discussion of how we handle allowable values. Do we have a plan for these? It seems like if the value of a given parameter matches one of the allowable values for a field then we're good. If it doesn't then the entry should be invalid. But we must support this.	Allowable value properties will need a way to select a parameter, which could be done by providing a drop-down of parameters to choose from, or possibly an option to enter a parameter using the new syntax. Once selected, then the value of the parameter would have to match one of the allowable values, otherwise a validation error would be produced.
Can we describe how upgrading a flow to a new version would work? There are a few interesting scenarios when the new version has a param context with new params, and how that is handled when bringing it into NiFi (i.e. are params added into the existing context, what if user doesn't have write perms to the context, etc).

Space shortcuts

Child pages