Security Feature Roadmap

Target release
Epic
Document status	DRAFT
Document owner	Andy LoPresto
Designer
Developers
QA

Goals

Document planned security feature roadmap
Solicit community feedback on goals, obstacles, user experience, and solutions

**Note: I wanted to get my thoughts documented here but all of this is very early stage and I welcome lots of community feedback here. What are the challenges users and developers face with security, what are the trade-offs people are willing to make, what do they not understand, etc.? I will clean up the formatting and as we feel these features become better described and captured, I will break them out to individual feature proposals. I need to correlate the existing Jiras as well and link them here**

Community Security Features Roadmap

TLS
* Refactor and consolidate {{SSLContextFactory}}, {{SSLContextService}}
* Refactor internals of TLS Toolkit
- "1-click" deployment with scripts for client certificate import to browser, etc.
* Individual configurations for
- UI/API
- Cluster
- Site to Site
- Processors ({{SSLContextService}})
* Mozilla Labs integration
- Easy assessment
- Easy config
- Automatic cipher suite upgrades/deprecations

Encrypted Config
* Login Identity Provider coverage
* Integration with Ambari
- CET read password/key from file descriptor
- Clean up original properties file after encryption
- Reversibility?
* Integration with Variable Registry
* Remove master key from {{bootstrap.conf}}
* Process monitor opacity
* Other provider integrations
- HSM
- Hadoop {{CredentialsProvider}} API
- Vault
- KeyWhiz

Provenance
* Signature on each event record (all metadata of event record, content claim ID/hash of content -> {{HMAC/SHA-256}} with per-node unique secret)
* Signature on chain (concatenation of all ER sigs -> {{HMAC/SHA-256}})
- CR key could be unique per-node
+ Node A -> Node B requires chain sigs on Node B to be {{S(S(A1|A2, KAC)|B1, KBC)}}; can't verify {{S(A1|A2, KAC)}} on Node B
+ *KAc*, *KBc*, etc. could be public/private key pairs derived from constant master key across all nodes (would allow cross-node verification)
- CR key could be same across all nodes in cluster/S2S deployment
+ Verifiability extends across entire lifespan of lineage data
- Investigate Axolotl (Double Ratchet), blockchain/alternative chain,
* UI display of provenance trust
- Per-event and per-chain
- Alerts for manipulated/modified provenance data

Repositories
* Transparent encryption of repository before persistence to file system
- {{Provenance}}
- {{Content}}
- {{Flowfile}} (attributes)
- {{Log}}? (intercept, filtering, hash/obfuscation?)
- {{Bulletin}}? (currently volatile impl only)
- {{ComponentStatus}}? (currently volatile impl only)
- {{Counter}}? (currently volatile impl only)
* Balance performance with security (retrieval has high cost)
* At which layer does the encryption/decryption occur (closest to file system/actual implementation of {{*Repository}} interface/AOP?)

Sensitive Attributes
* Mark attributes as sensitive (i.e. masked in UI, restricted to specific user access policy)
- Per-processor (e.g. all attributes originating from any {{EncryptContent}} processor)
- Per-instance (e.g. attributes originating from a specific UpdateAttribute processor which is handling PII)
* Encrypt before persisting even if not using {{EncryptedProvenanceRepository}}/{{EncryptedFlowfileRepository}}
* Existence of attribute could even be sensitive (e.g. SSN attribute, "security level" attribute)
* Cannot be modified/updated/removed by other processors?

Sensitive Content
* Data that enters system and needs to be immediately encrypted/anonymized/filtered
* Provenance provides access to raw input (e.g. before/after {{EncryptContent}})
* Ability to mark on processor to restrict/mask input of any flowfile that enters processor and provide access only to users with restricted access control policy
- Extends to provenance/content repositories

Dangerous Processors
* Processors which can directly affect behavior/configuration of NiFi/other services
- {{GetFile}}
- {{PutFile}}
- {{ExecuteScript}}
- {{InvokeScriptedProcessor}}
- {{ExecuteProcess}}
- {{ExecuteStreamCommand}}
* These processors should only be creatable/editable by users with special access control policy
* Marked by {{@Restricted}} annotation on processor class
* All flowfiles originating/passing through these processors have special attribute/protection

Flow Sensitivity Analysis
* Application-level intelligence to analyze flows (based on flow graph or flowfile provenance lineage) and determine existence of "dangerous processors" or "security processors" and proactively enable encrypted repositories/sensitive attributes for data traversing that flow

Visual indicators of security state
* UI panel displaying current server security state ("dashboard"/"quick view")
- TLS config for UI/API
+ Authentication mechanisms in place (client certificate, Kerberos, LDAP)
- TLS config for cluster
- TLS config for S2S
- Encryption status of repositories
- Users with admin/DFM/"dangerous" access

Extension Repository
* Cryptographic signatures of extensions verified by application to ensure no malicious interception/replacement of installed packages

Schema Repository
* Cryptographic signatures of schema handlers verified by application to ensure no malicious interception/replacement of installed packages

Variable Registry
* Detection of VR EL before template export (e.g. variable token {{${mysql_password}}} vs. literal password {{${mysql_password}}})
* Prevent enumeration of available values by unauthorized processors (e.g. {{RouteOnAttribute}} processor does not need access to {{mysql_password}})

Space shortcuts

Child pages

Goals