Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Page properties


StateDraftCompleted
Discussion Thread
Vote Threadhttps://lists.apache.org/thread/xdj33nr47ggw9rn0gwmj6xw03bqv202b
Vote Result Threadhttps://lists.apache.org/thread/2wqwvp6d2pyjbsgvzw6t68mh9f0x2bw9
Progress Tacking (PR/GitHub Project/Issue Label)https://github.com/orgs/apache/projects/276
Date Created

Created

Version Released
Authorsvincbeck



Motivation

Today, the user management is part of core Airflow. Users, roles and permissions are stored in the Airflow metastore and managed through Flask-AppBuilder (FAB). Any additional feature in the user management part of Airflow means modifying core Airflow and more importantly, verifying it fits everyone needs from individuals to teams within enterprises.

For context, this had been brought up in a discussion regarding multi tenancy. It had been suggested that, instead of adding new features in the user management part of Airflow (such as having tenants), to extract this part of core Airflow and move it to a new component in Airflow: the user auth manager. The target is, like executors, to have a generic interface defining the common API/functions of all user auth managers which they need to implement. This way, Airflow would offer a pluggable/extensible way to define and use the user auth manager that suits user needs. As opposed to the generic interface, the different user auth manager implementations are not part of core Airflow and reside, depending on the service used underneath, in their respective provider (e.g AWS, Google) if it exists or in a new provider if it does not.

Proposal

The proposal is to extract the whole user management part of Airflow outside of core Airflow and introduce the user auth manager. The goal of the user auth manager is to manage all features and resources related to users, roles and permissions. This way users could simply chose between a very minimalist/simple user auth manager and a more advanced one with notion of groups/tenants. Everything under the FAB security manager as it exists today is extracted out from core Airflow and handled by the user auth manager.

Image RemovedImage Added

The user auth manager interface (or base user auth manager) is an interface each user auth manager needs to inherit from. This interface defines the common API of a user auth manager and is the only integration point with core Airflow. In other words, any action related to user management is done through classes inheriting from this interface.
Since it is impossible to forecast what feature/view each user auth manager is going to offer, the “Security” tab in the nav bar will be configured by each user auth manager.
User Auth managers are “pluggable”, meaning you can swap them based on your installation needs. Airflow can only have one user auth manager configured at a time; this is set by the userauth_manager option in the [core] section of the configuration file.

Implementations

In order to explain more into details how auth managers work user managers, I decided to take two different implementations of user auth manager:

  • FAB user auth manager. This user auth manager offers the exact same features and experience as the current user management in Airflow. The implementation of this user auth manager is part of the this AIP.
  • KeyCloak user auth manager. This user auth managers leverages KeyCloak to manager users and roles. The implementation of this user auth manager is not part of the this AIP. I still decided to include diagrams and explanations about this user auth manager in this AIP to increase clarity and understanding about the potential different implementations of user auth manager.

Minimalist FAB

user

auth manager (backward compatible)

The target of the FAB user auth manager is to offer a backward compatible experience to the users. To put it simple, it moves the FAB security manager out of core Airflow to a new provider: the FAB provider. All the different pages are still served through the web server. The “Security” tab is configured to be as it is today. End users should see no difference between before user auth managers and after.

Image RemovedImage Added

KeyCloak

user

auth manager

The target of the KeyCloak user auth manager is to delegate the user management to KeyCloak. The whole user management part is delegated to KeyCloak and admins have to configure roles and permissions in KeyCloak directly. A new provider KeyCloak needs to be created and contain only the KeyCloak user auth manager.

Image Removed

Common API

All user managers have a common API defined in the user manager interface. You can find in the table below the common API needed from all user managers.

CategoryNameDescriptionNav barget_tab_title()Returns the tab title in the nav bar. Currently "Security"get_tab_menu()Returns the different items when hovering the tabURLsget_url_login()Returns URL to sign inget_url_logout()Returns URL to sign outget_url_account()Returns URL to access my account/profileAPIsis_logged_in()Return true if the current user is logged inget_user_name()Returns the user namepost_login()Post login operations needed depending on the user manager used. e.g. Storing the access tokenis_authorized()Is the user authorized to make an action on a given resource. See section "Authorization API" for more details

This auth manager will not be part of this AIP.

Image Added

Authentication flow

The authentication flow allows a user to log in Airflow. The flow follows the oauth 2.0 protocol.

When the authentication succeeds, a Flask session is created as it is today. This Flask session stores the user information about the user that is connected. Depending on the auth manager the kind of information stored can vary but, as an example, if the auth manager service used follows OpenID Connect (OIDC), the OIDC access token will be stored in the Flask session so it can be used anytime throughout the user session.

To simplify the example diagrams below, we consider the user is not logged in and the authentication on the backend side succeed.

FAB

user

auth manager

FAB user auth manager is different from the other user auth managers. Instead of delegating the login experience to an external service, it includes and defines the login page within the manager. The page is still served through the web server. The goal is to have the login page as it is today.

Image RemovedImage Added

KeyCloak

user

auth manager

Image RemovedImage Added

Authorization flow

The isThe is_authorized  API API is the API each user auth manager needs to implement to check whether the current user has permissions to make a specific action. Here are some examples of usage:

  • Has the current user permissions to list variables? is_authorized([(permissions.ACTION_CAN_READ, permissions.RESOURCE_VARIABLE)])
  • This API provides as well some context about the action being made and the resource being accessed so that the auth manager can make some authorization decisions based on multiple parameters. This context needs to be extensible so that it is easy to add new information to it. You can find the schema of this context below. It is represented as JSON for readability purposes but since is_authorized is a classic Python class method, they will be regular Python parameters.

    Code Block
    languagetext
    {
    	"action": "POST|GET|PUT|DELETE",
    	"resource-type": "<resource-type>",
    	"resource-details": {
    		"id": "<resource-id>",
    		"tags": [<resource-tag-1>, <resource-tag-2>],
            <other-resource-specific-information>,
    	},
    }
    • action. Create (POST), Read (GET), Update (PUT) or Delete (DELETE) operation. 
    • resource-type. The type of resource being accessed. The different values of this parameter are the list of resources already defined in Airflow today (you can see this list in Security → Resources in Airflow UI).
    • resource-details (optional). This is an optional extensible object to provide more metadata about the resource being accessed
      • id (optional). The resource ID. If resource is "DAG", then this parameter relates to the DAG ID, if resource is "Variable", it relates to the Variable ID, etc...
      • tags (optional). List of tags associated to the resource. Only available for resources you can tag, e.g. DAGs.
      • This object is extensible, as such, other parameters can be added to this object to provide more information about the resource. e.g. "dag-folder", if resource is "DAG", specifies the DAG folder where the DAG is defined.

    Examples:

    Is the user authorized to create a new variable?


    Code Block
    languagetext
    {
    	"action": "POST",
    	"resource-type": "Variable"
    }


    Is the user authorized to read/list variables?


    Code Block
    languagetext
    {
    	"action": "GET",
    	"resource-type": "Variable"
    }


    Is the user authorized to read the variable "my-var-id"?


    Code Block
    languagetext
    {
    	"action": "GET",
    	"resource-type": "Variable",
    	"resource-details": {
    		"id": "my-var-id",
    	},
    }


    Is the user authorized to delete the specific DAG "my-dag-id"?


    Code Block
    languagetext
    {
    	"action": "DELETE",
    	"resource-type": "DAG",
    	"resource-details": {
    		"id": "my-dag-id",
    		"tags": ["example1", "example2"],
    		"dag-folder": "/dags/marketing",
    	},
    }
    Has the current user permissions to read a specific DAG? is_authorized([(permissions.ACTION_CAN_READ, permissions.RESOURCE_DAG)], "dag_id")


    In order to understand how this API is implemented in different user auth managers, let’s take the use case of “User clicks on Variables in the Admin menu”.

    FAB

    user

    auth manager

    Image RemovedImage Added

    The is_authorized API API in the FAB user auth manager checks if the current user has the specified permissions. The implementation is really very close to to check_authorization in the security manager.

    KeyCloak

    user manager

    When logging in using KeyCloak, users are issued an access token stored in the metastore. This access token is used by KeyCloak to figure out if the current user has permissions to access a given resource.

    Image Removed

    Rest API and CLI

    sdfsaf

    auth manager

    Image Added

    Airflow Rest API

    As part of the Rest API, some resources are no longer managed by core Airflow but by auth managers: roles and users. Therefore, these APIs will be removed:

    However, some auth managers might need to define additional Rest API for their own needs. FAB auth manager is an example, in order to be backward compatible, the APIs listed above that are removed from core Airflow need to be redefined/moved from core Airflow to FAB auth manager. By default, no additional Rest API is defined in the base auth manager.

    Airflow CLI

    Among the sub-commands exposed by Airflow CLI, roles and users, similarly to the Rest API, need to be removed from core Airflow. Like the Rest API, some auth managers might need to define additional CLI commands (e.g. FAB auth manager).

    UI

    The different UI pages used to manage users and roles are no longer part of Core Airflow and moved to auth managers. Depending on the auth manager and its service/tool used underneath, two options are possibles:

    • Use the UI provided by the service/tool directly to manage users and roles. This is the preferred option.
    • Create UI pages in the auth manager to manage users and roles. This is the option chosen for the FAB auth manager.

    Even though the preferred option is to delegate entirely the user management to auth managers, the second option is necessary to implement the FAB auth manager.

    Auth manager API

    All auth managers have a common API defined in the auth manager interface. You can find in the table below the common API needed from all auth managers. The different categories are just for documentation and grouping purposes but might not be reflected in the architecture/code.

    CategoryNameDescription
    UIget_url_user_profile()Returns URL to access user profile
    get_user_name()Returns the user name
    Coreget_url_login()Returns URL to sign in
    get_url_logout()Returns URL to sign out
    is_logged_in()Return true if the current user is logged in
    login_callback()Callback called after login. It might be needed depending on the auth manager used. e.g. Storing the OIDC access token
    is_authorized()Is the user authorized to make an action on a given resource. See section "Authorization API" for more details
    get_security_manager_override_class()Specific an override for the security manager. Depending on the auth manager, you might need to override the security manager to add custom logic (e.g. register specific views)
    Additional resourcesrest_apis()Define additional Rest APIs
    cli_commands()Define additional CLI commands

    Future work

    Here are some examples of task that are not part of the AIP but can be done as follow-up once the AIP is completed.

    • Create KeyCloak provider and KeyCloak user auth manager within it
    • Additional providers (e.g. AWS user auth manager, Google user auth manager)

    Considerations

    What problem does it solve?

    It extracts out the makes user management from core component of Airflow and follow the approach "Airflow as a platform". The user management would be extensible and pluggable allowing creating pluggable and extensible by introducing an auth manager interface in the core Airflow that can be extended by any provider package who want to support user management natively. An extensible and pluggable user management would open up the potential for a more advanced user management features than there is today in Airflow such as group of users (or tenants)

    Why is it needed?

    Having a user management which fits everyone needs (from individuals to teams within enterprises) is impossible. Users need to have an extensible and pluggable way to use and define the user management they want.

    Native user management support in cloud providers means that roles can be mapped directed to identity provider.  Currently, Airflow operators have to work around this and are unable to provide seamless RBAC in Airflow when running it on different cloud platforms.

    Which users are affected by the change?

    All users are impacted by the change. Though, by default Airflow would use the FAB user auth manager that is backward compatible and users should not see any difference. Of course, if an admin decides to change the user auth manager to use another one, then the whole user management experience of the environment would change.

    What defines this AIP as "done"?

    • The user auth manager interface defined
    • New provider FAB provider created
    • FAB user auth manager inhering from the user auth manager interface defined. This FAB user auth manager is part of the new provider: FAB provider. By default Airflow uses this user auth manager