You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 36 Next »

Hive Authorization

Introduction

Note that this documentation is referring to Authorization which is verifying if a user has permission to perform certain action, and not about Authentication (verifying the identity of the user). Strong authentication for tools like hive commandline is provided through the use of kerberos. There are additional authentication options for users of HiveServer2.

Hive Authorization options

It is useful to think of authorization in terms of 2 primary use cases of Hive. 

  1. Hive as a table storage layer. This is the use case for Hive's Hcatalog api users such as Apache Pig , MapReduce and some MPP databases. In this case, Hive provides a table abstraction and metadata for files on storage (typically HDFS). These users have direct access to HDFS and the metastore server (which provides API for metadata access). HDFS access is authorized through the use of HDFS permissions. Metadata access needs to be authorized using hive configuration.
  2. Hive as a SQL query engine. This is one of the most common use cases of hive. This is the 'hive view' of SQL users and BI tools. This use case has following 2 subcategories -
    1. Hive command line users. These users have direct access to HDFS and hive metastore, which makes this use case similar to use case 1.
    2. ODBC/JDBC and other HiveServer2 API users. These users have all data/metadata access happening through HiveServer2. They don't have direct access to HDFS or metastore.

1 Storage Based Authorization in the Metastore server

Note that in use case 1 and 2a, the users have direct access to the data. Hive configurations don't control the data access. The HDFS permissions act as one source of truth for the table storage access. By enabling Storage Based Authorization in the Metastore server, you can use this single source for truth and have a consistent data and metadata authorization policy. To control metadata access on the metadata objects such as Database, Table and Partitions, it checks if you have permission on corresponding directories on the file system. You can also protect access through HiveServer2 (use case 2b above) by ensuring that the queries run as the end user (Ensure hive.server2.enable.doAs=true in HiveServer2 configuration (this is default configuration)).

Note that through the use of HDFS ACL (available in Apache Hadoop 2.4 onwards), you have a lot of flexibility in controlling access to the access to the file system, and in turn provides more flexibility with Storage Based Authorization. Also, note that you need the upcoming hive 0.14 release to make use of the flexibility provided through HDFS ACL (HIVE-7583).

2 SQL Standards based authorization in HiveServer2

Storage based authorization can provide access control only at the level of Databases, Tables and Partitions. It cannot control authorization at finer levels such as column and views, because the access control provided by the file system is at the level of directory and files. A prerequisite for fine grained access control is a data server that is able to provide just the columns and rows that a user needs (or has) access to. In case of file system, the whole file is served to the user. HiveServer2 satisfies this condition, as it has an API that understands rows and columns (through the use of SQL), and is able to serve just the columns and rows that your SQL query asked for.

SQL standards based authorization (introduced in Hive 0.13.0) can be used to enable fine grained access control. It is based on the SQL standard for authorization, and uses the familiar grant/revoke statements to control access. It needs to be enabled through HiveServer2 configuration. 

Note that its use for use case 2a (hive commandline) is disabled. This is because secure access control is not possible for hive commandline using a access control policy in hive. Disabling this avoids giving a false sense of security to users. Secure access control through access control policy in hive is not possible for commandline users as they have direct access to HDFS, and they can easily bypass the SQL standards based authorization checks or even disable it.

3 Default Hive Authorization mode (legacy mode)

Hive Default Authorization  is the authorization mode that has been available in earlier versions of Hive. However, this mode does not have a complete access control model, leaving many security gaps unaddressed. For example, the permissions needed to grant privileges for a user is not defined, and any user can grant themselves access to a table or database.

This model is similar to the SQL standards based authorization mode, in that it provides grant/revoke statement based access control. However, the access control policy is different from SQL Standards based authorization, and they are not compatible. Use of this mode is also supported for hive commandline users. However, for reasons mentioned under discussion of SQL Standards based authorization (above), it is not a secure mode of authorization for hive commandline.

Addressing authorization needs of multiple use cases

Storage based authorization provides a simple way to address all the use cases described above. However, if you also need finer grained access control for SQL users, you can also enable SQL standards based authorization mode in HiveServer2.

Ie, you can have storage based authorization  enabled for the metastore api calls (in hive metastore), and have SQL standards based authorization enabled in HiveServer2 at the same time.

Storage Based Authorization/Metastore Server Security

This section describes the metastore server security feature added to Hive in release 0.10. This feature was introduced previously in HCatalog (see Storage Based Authorization).

The Need for Metastore Server Security

When multiple clients access the same metastore in a backing database, such as MySQL, the database connection credentials may be visible in the hive-site.xml configuration file. A malicious or incompetent user could cause serious damage to metadata even though the underlying data is protected by HDFS access controls.

 Also, when a Hive metastore server uses Thrift to communicate with clients and has a backing database for metadata storage and persistence, the authentication and authorization done on the client side cannot guarantee security on the metastore side.To provide security for metadata, release 0.10 adds authorization capability to the metastore. (See HIVE-3705.) 

Storage Based Authorization

 When metastore server security is configured to use Storage Based Authorization, it uses the file system permissions for folders corresponding to the different metadata objects as the source of truth for the authorization policy. Use of Storage Based Authorization in metastore is recommended.

See details in the Storage Based Authorization document.

Configuration Parameters for Metastore Security

To enable Hive metastore server security, set these parameters in hive-site.xml:

  • hive.metastore.pre.event.listeners

    Set to org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.

    This turns on metastore-side security.

  • hive.security.metastore.authorization.manager
    Set to org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider.
    This tells Hive which metastore-side authorization provider to use. The default setting uses DefaultHiveMetastoreAuthorizationProvider, which implements the standard Hive grant/revoke model. To use an HDFS permission-based model (recommended) to do your authorization, you can use org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider instead.

    Version

    The StorageBasedAuthorizationProvider was introduced in Hive 0.10.0, running on the metastore side only (HIVE-3705). Starting in Hive 0.12.0 it also runs on the client side (HIVE-5048 and HIVE-5402).

  • hive.security.metastore.authenticator.manager

    Set to org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator.


    The snippet below shows the keys as they are in a default state in hive-site.xml (metastore-side security set up to use the default authorization/authentication, but disabled). Please edit in information as above to get the desired authorization behaviour:

 

<property>
  <name>hive.security.metastore.authorization.manager</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider</value>
  <description>authorization manager class name to be used in the metastore for authorization.
  The user defined authorization class should implement interface
  org.apache.hadoop.hive.ql.security.authorization.HiveMetastoreAuthorizationProvider.
  </description>
 </property>

<property>
  <name>hive.security.metastore.authenticator.manager</name>
  <value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
  <description>authenticator manager class name to be used in the metastore for authentication.
  The user defined authenticator should implement interface 
  org.apache.hadoop.hive.ql.security.HiveAuthenticationProvider.
  </description>
</property>

<property>
  <name>hive.metastore.pre.event.listeners</name>
  <value> </value>
  <description>pre-event listener classes to be loaded on the metastore side to run code
  whenever databases, tables, and partitions are created, altered, or dropped.
  Set to org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
  if metastore-side authorization is desired.
  </description>
</property>

SQL Standards Based Authorization

Hive release 0.13.0 introduced authorization based on SQL standards.  See SQL Standard Based Hive Authorization for details.

Hive Default Authorization (Legacy mode)

See details of the legacy mode for hive authorization.

 

  • No labels