Target releaseNIFI 1.0.0
Epic
Document statusFINAL
Document owner

Joe Witt

Designer
Developers
QA

Goals

  • Enable more powerful and flexible authorization decisions for actions within NiFi that the current role-based model
  • Enable support for an external and centralized user/entitlement management service rather than having those decisions embedded in NiFi

Background and strategic fit

The current authorization mechanism of NiFi provides a set of entitlements for a given user/entity.  The in internal code NiFi decides whether that entity and its entitlements fits a given command it plans to run.  This means the current authorization plugin is very limited in what it can convey and that NiFi is limited to only supporting the existing roles for existing methods.  This model is fairly easy to understand but inflexible and doesn't give us the capability we need as we move into Multi-Tenant authorization and perhaps even more interesting authorization schemes later (such as controlling whether a given person should even see a particular processor for adding to the flow, etc...)

Our current approach benefits though from being able to offer a pretty simple user experience whereby a user can be registered for a nifi account and an administrator can elect to give them access to certain defined roles or not.  We can show whether they have been active recently.  This approach works well for for the current inflexible role-based model we have.

A proposed approach is outlined as follows and is broken into two phases.  The first phase should align with NiFi 1.0.0 release and the second phase shall occur by then as well or can slip to a 1.x release shortly thereafter.

Phase1:

  • Drop user account request support as we'd move to a simpler scheme whereby given an established identity we ask the authorizer 'can they do this'
  • Drop UI based user role update, delete
  • Instead all authorization decisions are delegated to the Authorizer API provider.  The API would support a simple method 'isAuthorized'.  With 'isAuthorized', the call would be parameterized with the identity of the entity making the request (could be a chain of requesters/proxy), the resource being accessed, the action being attempted, and a map of contextual information about the request.  The call would then return indicating whether the request is allowed or not and may ideally provide some information that may be shared with the user or in logs as to why not.  
  • As for specific implementations of the Authorizer API we could provide a simple FileBasedAuthorizer similar to the current file based authority provider.  And we could implement against Apache Ranger or Apache Sentry which both offer pluggable authorizer modules that go with a given application against a centralized policy store.  In the case of the FileBasedAuthorizer we could still allow for runtime modifications to the accesses and users by letting someone edit the authority configuration by hand and we just automatically reload at runtime.  In the case of Ranger or Sentry module it would inherently delegate that information at runtime so we're good there and it would be both single node and cluster friendly.  FileBased would work sufficiently for single node cases but be awkward in a cluster configuration.

Implications of Phase1:

  • Our current Authority Provider API provides limited flexibility to support centralized policy management as it forces a given external authority provider to be mapped into a set of NiFi understood roles "DFM, Admin, Read-Only".  However, it offers a sufficient user experience for those roles and is easy to understand.
  • The proposed model offers a far more flexible authorization capability, represents a more efficient separation of concerns.  Stated another way it offers a highly flexible delegated authorization decision model with a lower quality user experience though the user experience for centralized user entitlement management may be better but is a different application.

To improve on the UX tradeoff introduced in phase 1 we propose phase 2:

  • Update Authorizer API to support CRUD operations for authorizers that are willing to implement them.  With Java 8 we can feasible extend that interface in a backward compatible way.
  • Add UI/Web API to leverage CRUD supporting Authorizers.  This will support some fixed UI concept for entitlement assignment to a given user.  This approach would be good for simple things like the FileBasedAuthorizer but unnecessary for things like Ranger/Sentry which have their own centralized entitlement management model.
  • Introduce a 'getAuthorizationHistory' for authorizers that are willing to implement them. The idea is that we could query the Authorizer API for information it might have about recent user access, last heard, last mutable request, last rejection.  Perhaps it could allow for finer grained information to be requested as well for why a rejection occurred and what was being requested.  With this capability the NiFi UI could expose a user table with that information.  We would not strictly be able to show 'all' users necessarily but we could show recent users and what we know about their types/recency of access.

Assumptions

  • Users in the 'learning NiFi', 'non-production', 'POC' phase with NiFi will want FileBasedAuthorizer and could live without a UI for managing entitlements so long as they can make changes at runtime to the config file.
  • Users in a production use mode will likely want to delegate to a centralized authorization mechanism and having a UI in NiFi to edit entitlements could be limiting for the flexibility of their centralized tool and not a good separation of concerns.
  • Regardless we should still find ways to offer a user experience that allows one to see which users have been active recently and to aid in troubleshooting access problems and understanding who has been altering or viewing the flow.
  • Formal auditing of actions shall be the job of the Authorizer implementation.  NiFi will still keep a local database of changes made so it can provide a nice user experience around which identities made which changes 'joe changed processor X property Y from A to B'

Requirements

#TitleUser StoryImportanceNotes
1
2    

User interaction and design

As part of this work the REST APIs will be refactored to better align with the resources being authorized and to address confusion over the allowed content types. By mirroring the authorization resources with our REST resources we can better scale as new features and ideas are added. Taking a super granular approach to authorization could still yield a cumbersome experience for the user so we've tried to design the API and resources to best accommodate that.

Resource /flow

/flow**
/flow/about
/flow/banner
/flow/search-results**
/flow/component-listing**
/flow/status
/flow/process-groups/{id}/status
/flow/processors/{id}/status
/flow/input-ports/{id}/status
/flow/output-ports/{id}/status
/flow/remote-process-groups/{id}/status
/flow/bulletin-board**
/flow/cluster/search-results

The flow resource is the only resource that would need to be authorized for a user to load the UI.

** indicates the results will be filtered according to the permissions of the user have with the resources in question. /flow will return the flow structure. If authorized for a particular resource that structure can include configuration details for a given resource. This would be equivalent to calling it’s own endpoint defined below. If not authorized the structure would only include UUID and position details.

Resource /system

/system/diagnostics

Explicit access to system diagnostics.

Resource /controller

/controller/config
/controller/reporting-tasks**
/controller/cluster
/controller/cluster/nodes/{id}

This approach implies that if a user has READ/WRITE to /controller then they can READ/WRITE the controller configuration and the cluster. Additionally, controller level bulletins are comprised of bulletins from reporting tasks and cluster events.

Resource /reporting-tasks/{id}

/reporting-tasks/{id}

Access to the reporting task will be handled by the controller if none are explicitly defined.

Resource /process-groups/{id}

/process-groups/{id}
/process-groups/{id}/controller-services**
/process-groups/{id}/processors**
/process-groups/{id}/process-groups**
/process-groups/{id}/remote-process-groups**
/process-groups/{id}/connections**
/process-groups/{id}/input-ports**
/process-groups/{id}/output-ports**
/process-groups/{id}/funnels**
/process-groups/{id}/labels**
/process-groups/{id}/snippets
/process-groups/{id}/snippets/{id}
/process-groups/{id}/snippet-instance
/process-groups/{id}/templates**
/process-groups/{id}/template-instance

Process group level bulletins are an aggregation of bulletins of all encapsulated components. ** indicates the results will be filtered according to the permissions of the user have with the resources in question.

Access to a given process group will extend access to encapsulated components unless overridden with explicit component level access policies.

A snippet is a section of the data flow and is created using the Ids of any number of selected components. We can then use this snippet for bulk operations like delete, move, copy/paste operations and as a source of a template. Snippets are typically short lived in that they are deleted once the action has been completed. In order to create the snippet, they user will need to have WRITE access to this process group and READ access to every specified component. When the snippet is actually used, we will need to ensure that they user has WRITE access to the process group in question and has READ or WRITE access for every component in the snippet based off of what it’s being used for. No explicit access policies will be allowed for snippets due their short-lived nature.

Resource /templates/{id}

/templates/{id}

Access to the template will be handled by the closest ancestor process group with access policies if none are explicitly defined.

Resource /controller-services/{id}

/controller-services/{id}

Access to the controller service will be handled by the closest ancestor process group with access policies if none are explicitly defined.

Resource /input-ports/{id}

/input-ports/{id}

Access to the input port will be handled by the closest ancestor process group with access policies if none are explicitly defined.

Resource /output-ports/{id}

/input-ports/{id}

Access to the output port will be handled by the closest ancestor process group with access policies if none are explicitly defined.

Resource /funnels/{id}

/funnels/{id}

Access to the output port will be handled by the closest ancestor process group with access policies if none are explicitly defined.

Resource /processors/{id}

/processors/{id}

Access to the processor will be handled by the closest ancestor process group with access policies if none are explicitly defined.

Resource /remote-process-groups/{id}

/remote-process-groups/{id}
/remote-process-groups/{id}/input-ports/{id}
/remote-process-groups/{id}/output-ports/{id}

Access to the remote process group will be handled by the closest ancestor process group with access policies if none are explicitly defined.

Resource /labels/{id}

/labels/{id}

Access to the label will be handled by the closest ancestor process group with access policies if none are explicitly defined.

Resource /connections/{id}

/connections/{id}

Access to the connection will be handled by the source component of that connection if no explicit access policies are defined.

Resource /flowfile-queue/{connection-id}

/flowfile-queue/{connection-id}/listing-request
/flowfile-queue/{connection-id}/drop-request
/flowfile-queue/{connection-id}/flowfiles/{id}
/flowfile-queue/{connection-id}/flowfiles/{id}/content

READ and WRITE access to the queue of a connection should be driven by WRITE access to that connection.

Resource /provenance

/provenance/event-search-request
/provenance/event-search-request/{id}
/provenance/lineage-request
/provenance/lineage-request/{id}
/provenance/events/{id}
/provenance/events/{id}/content/input
/provenance/events/{id}/content/output

User is allowed to access provenance. Event and lineage requests can be created if the user has access to /provenance. This is another retrieve and filter scenario.

Event access (and corresponding content) is driven by the component that generated it and the context (flowfile attributes). However, there may be a corner case here as we may want to be able to have explicit policies for the provenance events that differ from the access policies of the component that generated it.

Resource /site-to-site

/site-to-site

Endpoint for other NiFi instances to obtain details required to perform site to site data transfer.

User Identity Normalization

NiFi allows for a number of different authentication mechanisms. Once the user has authenticated he/she is identified by their identity. This is a String that is returned by the authentication mechanism. For certificate or LDAP based authentication this is a DN. For Kerberos it is the user principal. If the same user authenticates through different mechanisms, they may have different identities. In 0.x this was annoying but not a big deal as it was just a matter of (re)assigning the user role. However, in 1.x this is unacceptable given the nature of fine grain authorization and the number of access policies that must be duplicated.

 

Because we’ve delegated user authorization and the underlying implementation can choose to authenticate however they want, we cannot rely on our internal User/Group/Policy model to enforce any normalization.

 

After discussions with Andy, Joe, and Bryan, I think we have a proposal that will provide the flexibility needed without requiring continued maintenance by an Administrator. Additionally, it shouldn’t require too many additional development cycles.

 

The basic idea is to add configurable mappings that will run after the user identity determined but before any authorizations. These mappings are purposefully decoupled from the authentication mechanisms to reduce duplicative configurations and support a more flexible solution. These mappings could be configured in nifi.properties as follows. Here, the last segment of the property name is an identifier used to associate the pattern with the replacement value.

 

nifi.security.identity.mapping.pattern.dn=^cn=(.*?),dc=(.*?),dc=(.*?)$

 nifi.security.identity.mapping.value.dn=$1@$2.$3

nifi.security.identity.mapping.pattern.kerb=^(.*?)/instance@(.*?)$

nifi.security.identity.mapping.value.kerb=$1@$2

Questions

Below is a list of questions to be addressed as a result of this requirements document:

QuestionOutcome

Not Doing