Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: remove broken link to old doc

Storage Based Authorization

Table of Contents

Default Authorization Model of Hive

The default authorization model of Hive supports a traditional RDBMS style of authorization based on users, groups and roles and granting them permissions to do operations on database or table. It is described in more detail in Hive Authorization and Hive deprecated authorization mode / Legacy Mode.

This RDBMS style of authorization is not very suitable for the typical use cases in Hadoop because of the following differences in implementation:

...

In the HCatalog package, we have introduced implementation of an authorization interface that uses the permissions of the underlying file system (or in general, the storage backend) as the basis of permissions on each database, table or partition.

Tip
titleNote

This feature is also available in Hive on the metastore-side, starting with release 0.10.0 (see Storage Based Authorization in the Metastore Server in the Hive documentation). Starting in Hive 0.12.0 it also runs on the client side (HIVE-5048 and HIVE-5402).

In Hive, when a file system is used for storage, there is a directory corresponding to a database or a table. With this authorization model, the read/write permissions a user or group has for this directory determine the permissions a user has on the database or table. In the case of other storage systems such as HBase, the authorization of equivalent entities in the system will be done using the system’s authorization mechanism to determine the permissions in Hive.

...

When the database or table is backed by a file system that has a Unix/POSIX-style permissions model (like HDFS), there are read(r) and write(w) permissions you can set for the owner user, group and ‘other’. The file system’s logic for determining if a user has permission on the directory or file will be used by Hive.

Details of HDFS permissions are given at http://hadoop.apache.org/docs/rx.x.x/hdfs_permissions_guide.html, for example:

Note: Support for HDFS ACL (introduced in Apache Hadoop 2.4) is not available in the released versions of Hive.  Which means, that it checks only the traditional rwx style permissions to determine if a user can write to the file system. The support for ACL is available in Hive trunk HIVE-7583, which will be available in Hive 0.14.  

Links to documentation for different releases of Hadoop can be found here: http://hadoop.apache.org/docs/.

Note: If hive.warehouse.subdir.inherit.perms is enabled, permissions and ACL's for Hive-created files and directories will be set via the following permission inheritance rules.

The file system’s logic for determining if a user has permission on the directory or file will be used by Hive. 

Minimum Permissions

The following table shows the minimum permissions required for Hive operations under this authorization model:

...

The implementation of the file-system based authorization model is available in the HCatalog packagethrough an authorization provider called StorageBasedAuthorizationProvider that is part of Hive. (Support for this is likely to be was added to the Hive package in release 0.10.0 – see HIVE-3705 and Storage Based Authorization in the futureMetastore Server.) So using this implementation requires installing the HCatalog package along with Hive.

The HCatalog jar needs to be added to the Hive classpath. You can add the following to hive-env.sh to ensure that it gets added:

Info
titleVersion

An earlier implementation of this called HdfsAuthorizationProvider used to exist in the HCatalog package, but has since been deprecated and removed as of Hive 0.14 trunk. If your configuration indicates use of HdfsAuthorizationProvider, please update to this configuration instead.

No Format
export HIVE_AUX_JARS_PATH=<path to hcatalog jar>

The following entries need to be added to hive-site.xml to enable authorization:

No Format

  <property>
    <name>hive.security.authorization.enabled</name>
    <value>true</value>
    <description>enable or disable the hive client authorization</description>
  </property>

  <property>
    <name>hive.security.authorization.manager</name>
    <value>org.apache.hadoop.hive.hcatalogql.security.HdfsAuthorizationProvider<authorization.StorageBasedAuthorizationProvider</value>
    <description>the hive client authorization manager class name.
    The user defined authorization class should implement interface
    org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider.
    </description>
  </property>

...

  1. Some metadata operations (mostly read operations) do not check for authorization. See https://issues.apache.org/jira/browse/HIVE-3009.
  2. The current implementation of Hive performs the authorization checks in the client. This means that malicious users can circumvent these checks.
  3. A different authorization provider (StorageDelegationAuthorizationProvider) needs to be used for working with HBase tables as well. But that is not well tested.
  4. Partition files and directories added by a Hive query don’t inherit permissions from the table. This means that even if you grant permissions for a group to access a table, new partitions will have read permissions only for the owner, if the default umask for the cluster is configured as such. See https://issues.apache.org/jira/browse/HIVE-3094. A separate "hdfs chmod" command will be necessary to modify the permissions.
  5. Although DDL statements for managing permissions have no effect in storage-based authorization, currently they do not return error messages. See https://issues.apache.org/jira/browse/HIVE-3010.

 

Panel
titleColorindigo
titleBGColorsilver
titleNavigation Links

Previous: Notification

Hive documents: Authorization and Storage Based Authorization in the Metastore Server

General: HCatalog ManualWebHCat (Templeton) ManualHive Home
Old version (HCatalog 0.5.0): Storage Based AuthorizationWiki HomeHive Project Site