Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Hive Authorization

Table of Contents

Introduction

Note that this documentation is referring to Authorization which is verifying if a user has permission to perform a certain action, and not about Authentication (verifying the identity of the user). Strong authentication for tools like hive commandline the Hive command line is provided through the use of kerberosKerberos. There are additional authentication options for users of HiveServer2.

Hive Authorization

...

Options

Three modes of Hive authorization are available to satisfy different use cases.

Use Cases

It is useful to think of authorization in terms of 2 two primary use cases of Hive. 

  1. Hive as a table storage layer. This is the use case for Hive's Hcatalog api HCatalog API users such as Apache Pig, MapReduce and some MPP databasesMassively Parallel Processing databases (Cloudera Impala, Facebook Presto, Spark SQL etc). In this case, Hive provides a table abstraction and metadata for files on storage (typically HDFS). These users have direct access to HDFS and the metastore server (which provides an API for metadata access). HDFS access is authorized through the use of HDFS permissions. Metadata access needs to be authorized using hive Hive configuration.
  2. Hive as a SQL query engine. This is one of the most common use cases of hiveHive. This is the 'hive Hive view' of SQL users and BI tools. This use case has the following 2 two subcategories -:
    1. Hive command line users. These users have direct access to HDFS and hive the Hive metastore, which makes this use case similar to use case 1. Note, that usage of Hive CLI will be officially deprecated soon in favor of Beeline.
    2. ODBC/JDBC and other HiveServer2 API users (Beeline CLI is an example). These users have all data/metadata access happening through HiveServer2. They don't have direct access to HDFS or the metastore.

Overview of Authorization Modes

1 Storage Based Authorization in the Metastore

...

Server

Note that in In use case cases 1 and 2a, the users have direct access to the data. Hive configurations don't control the data access. The HDFS permissions act as one source of truth for the table storage access. By enabling Storage Based Authorization in the Metastore serverServer, you can use this single source for truth and have a consistent data and metadata authorization policy. To control metadata access on the metadata objects such as DatabaseDatabases, Table Tables and Partitions, it checks if you have permission on corresponding directories on the file system. You can also protect access through HiveServer2 (use case 2b above) by ensuring that the queries run as the end user (Ensure hive.server2.enable.doAs= option should be "true" in HiveServer2 configuration (this – this is a default configuration)value).

Note, that through the use of HDFS ACL (available in Apache Hadoop 2.4 onwards) , you have a lot of flexibility in controlling access to the access to the file system, and which in turn provides more flexibility with Storage Based Authorization. Also, note that you need the upcoming hive 0.14 release to make use of the flexibility provided through HDFS ACL (HIVE-7583).

2 SQL Standards based authorization in HiveServer2

. This functionality is available as of Hive 0.14 (HIVE-7583).

While relying on Storage based authorization for restricting access, you still need to enable one of the security options 2 or 3 listed below or use FallbackHiveAuthorizer to protect actions within the HiveServer2 instance.

Fall Back Authorizer

You need to use Hive 2.3.4 or 3.1.1 or later to use Fall Back Authorizer.

Admin needs to specify the following entries in
hiveserver2-site.xml:

<property>
  <name>hive.security.authorization.enabled</name>
  <value>true</value>
</property>
<property>
  <name>hive.security.authorization.manager</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.plugin.fallback.FallbackHiveAuthorizerFactory</value>
</property>

FallbackHiveAuthorizerFactory will do the following to mitigate above mentioned threat:

  1. Disallow local file location in sql statements except for admin
  2. Allow "set" only selected whitelist parameters
  3. Disallow dfs commands except for admin
  4. Disallow "ADD JAR" statement
  5. Disallow "COMPILE" statement
  6. Disallow "TRANSFORM" statement


2 SQL Standards Based Authorization in HiveServer2

Although Storage Based Authorization can provide access control Storage based authorization can provide access control only at the level of Databases, Tables and Partitions. It cannot , it can not control authorization at finer levels such as column columns and views , because the access control provided by the file system is at the level of directory and files. A prerequisite for fine grained access control is a data server that is able to provide just the columns and rows that a user needs (or has) access to. In the case of file system access, the whole file is served to the user. HiveServer2 satisfies this condition, as it has an API that understands rows and columns (through the use of SQL), and is able to serve just the columns and rows that your SQL query asked for.

SQL standards based authorizationStandards Based Authorization (introduced in Hive 0.13.0, HIVE-5837) can be used to enable fine grained access control. It is based on the SQL standard for authorization, and uses the familiar grant/revoke statements to control access. It needs to be enabled through HiveServer2 configuration. 

Note that its use for use case 2a (hive commandline) Hive command line) SQL Standards Based Authorization is disabled. This is because secure access control is not possible for hive commandline using a the Hive command line using an access control policy in hive. Disabling this avoids giving a false sense of security to users. Secure access control through access control policy in hive is not possible for commandline users as they Hive, because users have direct access to HDFS , and so they can easily bypass the SQL standards based authorization checks or even disable it .

3 Default Hive Authorization mode (legacy mode)

altogether. Disabling this avoids giving a false sense of security to users.

3 Authorization using Apache Ranger & Sentry

Apache Ranger and Apache Sentry are apache projects that use plugins provided by hive to do authorization.

The policies are maintained under repositories under those projects.

You also get many advanced features using them. For example, with Ranger you can view and manage policies through web interface, view auditing information, have dynamic row and column level access control (including column masking) based on runtime attributes.

4 Old default Hive Authorization (Legacy Mode)

Hive Old Default Authorization (was default before Hive 2.0.0) is Hive Default Authorization  is the authorization mode that has been available in earlier versions of Hive. However, this mode does not have a complete access control model, leaving many security gaps unaddressed. For example, the permissions needed to grant privileges for a user is are not defined, and any user can grant themselves access to a table or database.

This model is similar to the SQL standards based authorization mode, in that it provides grant/revoke statement-based access control. However, the access control policy is different from SQL Standards standards based authorization, and they are not compatible. Use of this mode is also supported for hive commandline Hive command line users. However, for reasons mentioned under the discussion of SQL Standards standards based authorization (above), it is not a secure mode of authorization for hive commandlinethe Hive command line.

Addressing

...

Authorization Needs of Multiple Use Cases

Storage based authorization provides a simple way to address all the use cases described above. However, if you also need finer grained access control for SQL users, you can also enable SQL standards based authorization mode in HiveServer2.

IeThat is, you can have storage based authorization  enabled enabled for the metastore api API calls (in hive the Hive metastore) , and have SQL standards based authorization enabled in HiveServer2 at the same time.

Hive Default Authorization

This section describes Hive security using the basic authorization scheme, which regulates access to Hive metadata on the client side. Starting with Hive release 0.10, additional security measures can be enabled to regulate access on the metastore side, as described in Metastore Server Security below.

Disclaimer

Hive authorization is not completely secure. The basic authorization scheme is intended primarily to prevent good users from accidentally doing bad things, but makes no promises about preventing malicious users from doing malicious things. Starting in Hive release 0.10, however, metastore-side security can be enabled to prevent malicious access to metadata in a metastore server configuration.

Prerequisites

In order to use Hive authorization, there are two parameters that should be set in hive-site.xml:

No Format
<property>
  <name>hive.security.authorization.enabled</name>
  <value>true</value>
  <description>enable or disable the hive client authorization</description>
</property>

<property>
  <name>hive.security.authorization.createtable.owner.grants</name>
  <value>ALL</value>
  <description>the privileges automatically granted to the owner whenever a table gets created. 
   An example like "select,drop" will grant select and drop privilege to the owner of the table</description>
</property>

Note that, by default, the hive.security.authorization.createtable.owner.grants are set to null, which would result in the creator of a table having no access to the table.

Users, Groups, and Roles

At the core of Hive's authorization system are users, groups, and roles. Roles allow administrators to give a name to a set of grants which can be easily reused. A role may be assigned to users, groups, and other roles. For example, consider a system with the following users and groups:

  • <User>: <Groups>
  • user_all_dbs: group_db1, group_db2
  • user_db1: group_db1
  • user_db2: group_db2

If we wanted to restrict each user to a specific set of databases, we could use roles to build the authorization mechanism. The administrator would create two roles, called role_db1 and role_db2. The role_db1 role would provide privileges just for the first database, and the role_db2 role would provide privileges just for the second database. The administrator could then grant the role_db1 role to group_db1, or explicitly for the users in the group, and do the same for role_db2 with the users of the second database. In order to allow users who need to see all databases to get their appropriate privileges, a third role could be created called role_all_dbs, which would be granted role_db1 and role_db2. When user_all_dbs is granted the role_all_dbs role, the user implicitly is granted all the privileges of role_db1 and role_db2.

Hive roles must be created manually before being used, unlike users and groups. Users and groups are managed by the hive.security.authenticator.manager. When a user connects to a Metastore Server and issues a query, the Metastore will determine the username of the connecting user, and the groups associated with that ushive.security.authorization.ername. That information is then used to determine if the user should have access to the metadata being requested, by comparing the required privileges of the Hive operation to the user privileges using the following rules:

  • User privileges (Has the privilege been granted to the user)
  • Group privileges (Does the user belong to any groups that the privilege has been granted to)
  • Role privileges (Does the user or any of the groups that the user belongs to have a role that grants the privilege)

By default, the Metastore uses the HadoopDefaultAuthenticator for determing user -> group mappings, which determines authorization by using the Unix usernames and groups on the machine where the Metastore is running. To make this more clear, consider a scenario where a user foo is a member of group bar on the machine running the Hive CLI, and connects to a Metastore running on a separate server that also has a user named foo, but on the Metastore Server, foo is a member of group baz. When an operation is executed, the Metastore will determine foo to be in the group baz.

Taking this a step further, it is also possible for the groups that a user belongs to on the Metastore Server may differ from the groups that the same user belongs to, as determined by HDFS. This could be the case if Hive or HDFS are configured to use non-default user -> group mappers, or the Metastore and the Namenode both use the defaults, but the processes are running on different machines, and the user -> group mappings are not the same on each machine.

It is important to realize that Hive Metastore only controls authorization for metadata, and the underlying data is controlled by HDFS, so if permissions and privileges between the two systems are not in sync, users may have access to metadata, but not the physical data. If the user -> group mappings across the Metastore and Namenode are not in sync, as in the scenarios above, a user may have the privileges required to access a table according to the Metastore, but may not have permission to access the underlying files according to the Namenode. This could also happen due to administrator intervention, if permissions on the files were changed by hand, but Metastore grants had not been updated.

Names of Users and Roles

Role names are case insensitive. That is, “marketing” and “MarkEting” refer to same role.

User names are case sensitive. This is because, unlike role names, user names are not managed within Hive.

Info
titleQuoted Identifiers in Version 0.13.0+

As of Hive 0.13.0, user and role names may optionally be surrounded by backtick characters (`) when the configuration parameter hive.support.quoted.identifiers is set to column (default value). All Unicode characters are permitted in the quoted identifiers, with double backticks (``) representing a backtick character. However when hive.support.quoted.identifiers is set to none, or in Hive 0.12.0 and earlier, only alphanumeric and underscore characters are permitted in user names and role names.

For details, see HIVE-6013 and Supporting Quoted Identifiers in Column Names.

Creating/Dropping/Using Roles

Create/Drop Role

No Format
CREATE ROLE role_name

DROP ROLE role_name

Grant/Revoke Roles

No Format
GRANT ROLE role_name [, role_name] ...
TO principal_specification [, principal_specification] ...
[WITH ADMIN OPTION]

REVOKE [ADMIN OPTION FOR] ROLE role_name [, role_name] ...
FROM principal_specification [, principal_specification] ...

principal_specification:
    USER user
  | GROUP group
  | ROLE role
Info
titleVersion

GRANT ROLE added the optional WITH ADMIN OPTION clause in Hive 0.13.0 (HIVE-5923).

REVOKE ROLE will add the optional ADMIN OPTION FOR clause in Hive 0.14.0 (HIVE-6252).

 

Viewing Granted Roles

No Format
SHOW ROLE GRANT principal_specification
 
principal_specification:
    USER user
  | GROUP group
  | ROLE role
Info
titleVersion

The output of SHOW ROLE GRANT is in tabular format starting with Hive 0.13.0 (HIVE-6204).

Privileges

The following privileges are supported in Hive:

  • ALL - Gives users all privileges
  • ALTER - Allows users to modify the metadata of an object
  • UPDATE - Allows users to modify the physical data of an object
  • CREATE - Allows users to create objects. For a database, this means users can create tables, and for a table, this means users can create partitions
  • DROP - Allows users to drop objects
  • INDEX - Allows users to create indexes on an object (Note: this is not currently implemented)
  • LOCK - Allows users to lock or unlock tables when concurrency is enabled
  • SELECT - Allows users to access data for objects
  • SHOW_DATABASE - Allows users to view available databases

Grant/Revoke Privileges

No Format
GRANT
    priv_type [(column_list)]
      [, priv_type [(column_list)]] ...
    [ON object_type]
    TO principal_specification [, principal_specification] ...
    [WITH GRANT OPTION]

REVOKE [GRANT OPTION FOR]
    priv_type [(column_list)]
      [, priv_type [(column_list)]] ...
    [ON object_type priv_level]
    FROM principal_specification [, principal_specification] ...

REVOKE ALL PRIVILEGES, GRANT OPTION
    FROM user [, user] ...

priv_type:
    ALL | ALTER | UPDATE | CREATE | DROP
  | INDEX | LOCK | SELECT | SHOW_DATABASE 
 
object_type:
    TABLE
  | DATABASE

priv_level:
    db_name
  | tbl_name
 
principal_specification:
    USER user
  | GROUP group
  | ROLE role
Info
titleVersion

REVOKE priv_type will add the optional GRANT OPTION FOR clause in Hive 0.14.0 (HIVE-7404).

 

Viewing Granted Privileges

No Format
SHOW GRANT principal_specification
[ON object_type priv_level [(column_list)]]
 
principal_specification:
    USER user
  | GROUP group
  | ROLE role
 
object_type:
    TABLE
  | DATABASE

priv_level:
    db_name
  | tbl_name
Info
titleVersion

The output of SHOW GRANT is in tabular format starting with Hive 0.13.0 (HIVE-6204).

 

Hive Operations and Required Privileges

As of the release of Hive 0.7, only these operations require permissions, according to org.apache.hadoop.hive.ql.plan.HiveOperation:

Operation

ALTER

UPDATE

CREATE

DROP

INDEX

LOCK

SELECT

SHOW_DATABASE

LOAD

 

X

 

 

 

 

 

 

EXPORT

 

 

 

 

 

 

X

 

IMPORT

X

X

 

 

 

 

 

 

CREATE TABLE

 

 

X

 

 

 

 

 

CREATE TABLE AS SELECT

 

 

X

 

 

 

X

 

DROP TABLE

 

 

 

X

 

 

 

 

SELECT

 

 

 

 

 

 

X

 

ALTER TABLE ADD COLUMN

X

 

 

 

 

 

 

 

ALTER TABLE REPLACE COLUMN

X

 

 

 

 

 

 

 

ALTER TABLE RENAME

X

 

 

 

 

 

 

 

ALTER TABLE ADD PARTITION

 

 

X

 

 

 

 

 

ALTER TABLE DROP PARTITION

 

 

 

X

 

 

 

 

ALTER TABLE ARCHIVE

 

X

 

 

 

 

 

 

ALTER TABLE UNARCHIVE

 

X

 

 

 

 

 

 

ALTER TABLE SET PROPERTIES

X

 

 

 

 

 

 

 

ALTER TABLE SET SERDE

X

 

 

 

 

 

 

 

ALTER TABLE SET SERDE

X

 

 

 

 

 

 

 

ALTER TABLE SET SERDEPROPERTIES

X

 

 

 

 

 

 

 

ALTER TABLE CLUSTER BY

X

 

 

 

 

 

 

 

ALTER TABLE PROTECT MODE

X

 

 

 

 

 

 

 

ALTER PARTITION PROTECT MODE

X

 

 

 

 

 

 

 

ALTER TABLE SET FILEFORMAT

X

 

 

 

 

 

 

 

ALTER PARTITION SET FILEFORMAT

X

 

 

 

 

 

 

 

ALTER TABLE SET LOCATION

 

X

 

 

 

 

 

 

ALTER PARTITION SET LOCATION

 

X

 

 

 

 

 

 

ALTER TABLE CONCATENATE

 

X

 

 

 

 

 

 

ALTER PARTITION CONCATENATE

 

X

 

 

 

 

 

 

SHOW DATABASES

 

 

 

 

 

 

 

X

LOCK TABLE

 

 

 

 

 

X

 

 

UNLOCK TABLE

 

 

 

 

 

X

 

 

Metastore Server Security

This section describes the metastore server security feature added to Hive in release 0.10. This feature was introduced previously in HCatalog (see Storage Based Authorization).

The Need for Metastore Server Security

When multiple clients access the same metastore in a backing database, such as MySQL, the database connection credentials may be visible in the hive-site.xml configuration file. A malicious or incompetent user could cause serious damage to metadata even though the underlying data is protected by HDFS access controls.

Also, when a Hive metastore server uses Thrift to communicate with clients and has a backing database for metadata storage and persistence, the authentication and authorization done on the client side cannot guarantee security on the metastore side.

To provide security for metadata, release 0.10 adds authorization capability to the metastore. (See HIVE-3705.) 

 

Storage Based Authorization

When metastore server security is configured to use Storage Based Authorization, it uses the file system permissions for folders corresponding to the different metadata objects as the source of truth for the authorization policy. Use of Storage Based Authorization in metastore is recommended.

See details in the Storage Based Authorization document.

Configuration Parameters for Metastore Security

To enable Hive metastore server security, set these parameters in hive-site.xml:

  • hive.metastore.pre.event.listeners
    Set to org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.
    This turns on metastore-side security.
  • hive.security.metastore.authorization.manager
    Set to org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider.
    This tells Hive which metastore-side authorization provider to use. The default setting uses DefaultHiveMetastoreAuthorizationProvider, which implements the standard Hive grant/revoke model. To use an HDFS permission-based model (recommended) to do your authorization, you can use org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider instead.

    Info
    titleVersion

    The StorageBasedAuthorizationProvider was introduced in Hive 0.10.0, running on the metastore side only (HIVE-3705). Starting in Hive 0.12.0 it also runs on the client side (HIVE-5048 and HIVE-5402).

  • hive.security.metastore.authenticator.manager
    Set to org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator.

The snippet below shows the keys as they are in a default state in hive-site.xml (metastore-side security set up to use the default authorization/authentication, but disabled). Please edit in information as above to get the desired authorization behaviour:

No Format
<property>
  <name>hive.security.metastore.authorization.manager</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider</value>
  <description>authorization manager class name to be used in the metastore for authorization.
  The user defined authorization class should implement interface
  org.apache.hadoop.hive.ql.security.authorization.HiveMetastoreAuthorizationProvider.
  </description>
 </property>

<property>
  <name>hive.security.metastore.authenticator.manager</name>
  <value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
  <description>authenticator manager class name to be used in the metastore for authentication.
  The user defined authenticator should implement interface 
  org.apache.hadoop.hive.ql.security.HiveAuthenticationProvider.
  </description>
</property>

<property>
  <name>hive.metastore.pre.event.listeners</name>
  <value> </value>
  <description>pre-event listener classes to be loaded on the metastore side to run code
  whenever databases, tables, and partitions are created, altered, or dropped.
  Set to org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
  if metastore-side authorization is desired.
  </description>
</property>

SQL Standards Based Authorization

Explain Authorization

Info
titleVersion 0.14 — EXPLAIN AUTHORIZATION

Starting in Hive 0.14.0, the HiveQL command EXPLAIN AUTHORIZATION shows all entities that need to be authorized to execute a query, as well as any authorization failures.

More Information

For detailed information about the Hive authorization modes, see:

...