Target release
Epic
Document statusDRAFT
Document owner

Andy LoPresto

Designer
Developers
QA

Goals

  • Document planned security feature roadmap
  • Solicit community feedback on goals, obstacles, user experience, and solutions

**Note: I wanted to get my thoughts documented here but all of this is very early stage and I welcome lots of community feedback here. What are the challenges users and developers face with security, what are the trade-offs people are willing to make, what do they not understand, etc.? I will clean up the formatting and as we feel these features become better described and captured, I will break them out to individual feature proposals. I need to correlate the existing Jiras as well and link them here**

Community Security Features Roadmap

TLS
* Refactor and consolidate {{SSLContextFactory}}, {{SSLContextService}}
* Refactor internals of TLS Toolkit
- "1-click" deployment with scripts for client certificate import to browser, etc.
* Individual configurations for
- UI/API
- Cluster
- Site to Site
- Processors ({{SSLContextService}})
* Mozilla Labs integration
- Easy assessment
- Easy config
- Automatic cipher suite upgrades/deprecations

Encrypted Config
* Login Identity Provider coverage
* Integration with Ambari
- CET read password/key from file descriptor
- Clean up original properties file after encryption
- Reversibility?
* Integration with Variable Registry
* Remove master key from {{bootstrap.conf}}
* Process monitor opacity
* Other provider integrations
- HSM
- Hadoop {{CredentialsProvider}} API
- Vault
- KeyWhiz

Provenance
* Signature on each event record (all metadata of event record, content claim ID/hash of content -> {{HMAC/SHA-256}} with per-node unique secret)
* Signature on chain (concatenation of all ER sigs -> {{HMAC/SHA-256}})
- CR key could be unique per-node
+ Node A -> Node B requires chain sigs on Node B to be {{S(S(A1|A2, KAC)|B1, KBC)}}; can't verify {{S(A1|A2, KAC)}} on Node B
+ *KAc*, *KBc*, etc. could be public/private key pairs derived from constant master key across all nodes (would allow cross-node verification)
- CR key could be same across all nodes in cluster/S2S deployment
+ Verifiability extends across entire lifespan of lineage data
- Investigate Axolotl (Double Ratchet), blockchain/alternative chain,
* UI display of provenance trust
- Per-event and per-chain
- Alerts for manipulated/modified provenance data

Repositories
* Transparent encryption of repository before persistence to file system
- {{Provenance}}
- {{Content}}
- {{Flowfile}} (attributes)
- {{Log}}? (intercept, filtering, hash/obfuscation?)
- {{Bulletin}}? (currently volatile impl only)
- {{ComponentStatus}}? (currently volatile impl only)
- {{Counter}}? (currently volatile impl only)
* Balance performance with security (retrieval has high cost)
* At which layer does the encryption/decryption occur (closest to file system/actual implementation of {{*Repository}} interface/AOP?)

Sensitive Attributes
* Mark attributes as sensitive (i.e. masked in UI, restricted to specific user access policy)
- Per-processor (e.g. all attributes originating from any {{EncryptContent}} processor)
- Per-instance (e.g. attributes originating from a specific UpdateAttribute processor which is handling PII)
* Encrypt before persisting even if not using {{EncryptedProvenanceRepository}}/{{EncryptedFlowfileRepository}}
* Existence of attribute could even be sensitive (e.g. SSN attribute, "security level" attribute)
* Cannot be modified/updated/removed by other processors?

Sensitive Content
* Data that enters system and needs to be immediately encrypted/anonymized/filtered
* Provenance provides access to raw input (e.g. before/after {{EncryptContent}})
* Ability to mark on processor to restrict/mask input of any flowfile that enters processor and provide access only to users with restricted access control policy
- Extends to provenance/content repositories

Dangerous Processors
* Processors which can directly affect behavior/configuration of NiFi/other services
- {{GetFile}}
- {{PutFile}}

- {{ListFile}}

- {{FetchFile}}

- {{ExecuteScript}}
- {{InvokeScriptedProcessor}}
- {{ExecuteProcess}}
- {{ExecuteStreamCommand}}
* These processors should only be creatable/editable by users with special access control policy
* Marked by {{@Restricted}} annotation on processor class
* All flowfiles originating/passing through these processors have special attribute/protection

* Perhaps *File processors can access a certain location by default but cannot access the root filesystem without special user permission?

Flow Sensitivity Analysis
* Application-level intelligence to analyze flows (based on flow graph or flowfile provenance lineage) and determine existence of "dangerous processors" or "security processors" and proactively enable encrypted repositories/sensitive attributes for data traversing that flow

Visual indicators of security state
* UI panel displaying current server security state ("dashboard"/"quick view")
- TLS config for UI/API
+ Authentication mechanisms in place (client certificate, Kerberos, LDAP)
- TLS config for cluster
- TLS config for S2S
- Encryption status of repositories
- Users with admin/DFM/"dangerous" access

Extension Repository
* Cryptographic signatures of extensions verified by application to ensure no malicious interception/replacement of installed packages

Schema Repository
* Cryptographic signatures of schema handlers verified by application to ensure no malicious interception/replacement of installed packages

Variable Registry
* Detection of VR EL before template export (e.g. variable token {{${mysql_password}}} vs. literal password {{${mysql_password}}})
* Prevent enumeration of available values by unauthorized processors (e.g. {{RouteOnAttribute}} processor does not need access to {{mysql_password}})

2 Comments

  1. Thanks for getting this going, Andy. I look forward to collaborating on this. 

    You've covered a lot of ground here. I'll start with where my concerns are now wrt NiFi's viability within our business environment. 

    There's a difference between security concerns based on compliance and security concerns based on vulnerability. I currently am more concerned about the former. Ensuring that data is always encrypted at rest is the big one. Because there is no provenance encryption, I cannot enable provenance on disk at all. With provenance encryption (and nothing else), I could turn provenance on and trust that our NiFi server is not compromising our compliance requirements.

    A further improvement would be enabling sensitive attributes and content. My NiFi workflow often involves pulling some pieces of the content up into a attributes, then working with the attributes before re-writing them to the content or creating a new flow file from the special attributes. The suggested feature to mask these attributes in the provenance is nice, but the major win would be automatically encrypting them wherever they persist (in addition to the basic provenance encryption). This would provide another layer of security when we take sensitive content and bring them up to attributes. I would not want a feature which disables any changes to the sensitive attributes. If I'm creating an attribute (sensitive data or not), then I likely will need to manipulate said attribute. I would not want to be prevented from altering the sensitive data. I mainly would like it to be encrypted and restricted for provenance viewing (for certain users).

    Sensitive content is another great suggestion that would make NiFi more viable for those in compliance settings. Across the board provenance encryption would solve this problem for me already however an alternative, more configurable approach such as this would be welcomed. This feature would enable me to use the content repository. If I flag certain get processors as generating sensitive content and have them encrypt in the content repo, then I could utilize the content repo along with the provenance repo as tools to track NiFi processes and develop new flows etc.

    With these new features being prioritized mainly to ensure compliance requirements are met, the UI should give some feedback when those features are in use. I do not have personal preferences to how this appears in the interface, but I can give feedback as options are presented.

    When I look at vulnerabilities of NiFi, things are on the user-level, server-level and network-level. On the user level, if we have restricted users who can access sensitive content, sensitive attributes and sensitive processors then we're much better off. I would love the ability to limit dangerous processors to certain users. This would provide me the ability to utilize LDAP logins as another layer of security on the content NiFi has access to. One of the challenges I'm running into with NiFi is the dev ops director does not enjoy giving one environment network access to all of our sensitive content with no restrictions. I empathize and see further user-level security as a mitigation to the risk presented by opening up NiFi to so much content. The LDAP security helps, and it'd be even better if we could limit who can get things and put them other places. This would also give me the ability to control what certain users are able to do with NiFi to manage resources. 

    One the server-level, config encryption (along with provenance and content repo encryption already discussed) would be nice. We have not added NiFi to Ambari yet. Any level of additional security for the config files would be appreciated but this is secondary to the previously mentioned items. 

    One consideration when implementing these security features will always be the tradeoff between usability/performance and security. Ideally NiFi can be left to work in an "insecure" way and any additional security feature is implemented as optional.

    We do not use the TLS toolkit. I do not see this feature as a major need when looking at implementing NiFi in production because we will use our own generated certs.

    There are other good things mentioned in your draft, but I'm going to stop here after discussing my top issues and anticipate further discussion and input from others.

  2. Andy LoPresto - what do you think about allowing trust based access using X.509v3 certificates? Would work similarly to what we currently do but rather than authenticating against the Subject of the certificate, we would assign a unique identifier to to the issuer of the certificate (i.e. the trusted CA).

     

    This could particularly handy when authenticating MiNiFi agents connecting to secure site to site:

    1. client generates CSR and is granted a certificate by CA_X (external to NiFi)
    2. client tries to connect to NiFi cluster using secure site-to-site
    3. NiFi checks certificate validity using expiration, CRLs, OCSP,  etc
    4. If certificate is still valid, grant semi anonymous access to minifi agent using the CA_X as the de-facto identity for authorization purposes.

    The idea is to allow authorizing via trusted path without having to map the subject down to an individual entity:

    Suppose I issued thousands of certificates to manage my puppet agents (puppet uses mutual TLS for comms). Suppose to save hassle, I decide to share the certificate between MiNiFi and puppet-agent (under assumption the first already runs as root and has vast powers over a system, a compromise of that certificate would very likely indicate the server is no longer trustworthy anyhow, therefore, using is against NiFi would be safe enough for NiFi).

    Now, instead of having to individually import each of my agents into NiFi (and clutter my users.xml), I would simply instruct the NiFi cluster to trust all puppet issued certificates and treat them as lets say, NiFi Group PuppetAgents


    I then give PuppetAgents access to send data into Input ports and voila, I just provisioned hundreds of MiNiFi clients without having to fiddle with individual agent entities. My users.xml is still clean by de facto delegating the authentication to the puppet CA.

    Replace puppet with RedHat's FreeIPA and the same benefit arises. 

     

    What do you think?