You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 26 Next »

TL;DR

If all services in the minicluster are set up correctly and are up, run this to restart the Impala cluster with Ranger authorization:

bin/start-impala-cluster.py --impalad_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger" --catalogd_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger"

If you want to investigate Hive behaviors in Ranger authorization, run this to restart Hive:

testdata/bin/run-hive-server.sh -with_ranger

Details

When starting an Impala minicluster on a local machine using '$IMPALA_HOME/bin/start-impala-cluster.py', by default there is no authorization service, e.g., Sentry or Ranger, enabled on this cluster. We could enable the Ranger service initialized with the default policies on an Impala minicluster by the following steps (assuming that we have already executed '$IMPALA_HOME/testdata/bin/run-all.sh').

  1. Execute '$IMPALA_HOME/testdata/bin/kill-ranger-server.sh'
  2. Execute 'source $IMPALA_HOME/bin/impala-config.sh'
  3. Execute '$IMPALA_HOME/buildall.sh -noclean -notests -ninja'
  4. Execute '$IMPALA_HOME/bin/create-test-configuration.sh -create_ranger_policy_db'
  5. Execute '$IMPALA_HOME/testdata/bin/run-ranger-server.sh'
  6. Execute '$IMPALA_HOME/testdata/bin/create-load-data.sh'
  7. Execute the following command. We note that the arguments passed into 'start-impala-cluster.py' could also be found at '$IMPALA_HOME/tests/authorization/test_ranger.py'. The first 4 arguments are passed to 'start-impala-cluster.py' directly and those surrounded by a pair of single quotation marks are passed to statestore, impalad, and catalogd, respectively.

$IMPALA_HOME/bin/start-impala-cluster.py \
--cluster_size=3 \
--num_coordinators=3 \
--log_dir=/tmp/ \
--log_level=1 \
'--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50' \
'--impalad_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger' \
'--catalogd_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger'

Once the minicluster has been started, we could log into the Impala shell as an administrator with username 'admin' by executing '$IMPALA_HOME/bin/impala-shell -u admin'. The account of 'admin' was added by the function 'setup-ranger' in 'create-load-data.sh' above. To tell whether or not the Impala is Ranger-enabled, we could try to execute 'refresh authorization' in the Impala shell. If the Ranger is enabled in Impala, we would see some output similar to the following.

[localhost:21000] default> refresh authorization;
Query: refresh authorization
Query submitted at: 2019-08-29 15:17:37 (Coordinator: http://fangyurao-OptiPlex-9020:25000)
Query progress can be monitored at: http://fangyurao-OptiPlex-9020:25000/query_plan?query_id=374567f0bf4ca48b:0769d23700000000
Fetched 0 row(s) in 0.02s

If the Ranger service is not correctly enabled, then after executing '$IMPALA_HOME/bin/impala-shell -u admin' followed by 'refresh authorization' in the Impala shell, we may see the following error message.

[localhost:21000] default> refresh authorization;
Query: refresh authorization
Query submitted at: 2019-08-29 15:23:37 (Coordinator: http://fangyurao-OptiPlex-9020:25000)
ERROR: AnalysisException: Authorization is not enabled. To enable authorization restart Impala with the --server_name=<name> flag.

We also note that currently to create 'admin' in the Ranger service before this account is created, we have to run '$IMPALA_HOME/testdata/bin/create-load-data.sh' (the 5th step above to start a Ranger-enabled Impala minicluster), which does much more than needed because this script will also load the whole test datasets, which is time-consuming. A better approach is thus to only call the function 'setup-ranger' in 'create-load-data.sh' only. To achieve this, we may consider moving the function 'setup-ranger' out of 'create-load-data.sh' and then we make 'create-load-data.sh' call 'setup-ranger'.

Troubleshooting

1.  Encounter errors like "AuthorizationException: User 'admin' does not have privileges to execute ..." in tests

Range may not be configured correctly. See logs in ${IMPALA_HOME}/logs/cluster/ranger/ranger-admin-${HOSTNAME}-${USER}.log, there may be errors like

2019-10-10 02:29:41,007 [http-bio-6080-exec-2] ERROR org.apache.ranger.common.ServiceUtil (ServiceUtil.java:1359) - Requested Service not found. serviceName=test_impala
2019-10-10 02:29:41,008 [http-bio-6080-exec-2] INFO  org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:345) - Request failed. loginId=null, logMessage="RANGER_ERROR_SERVICE_NOT_FOUND: ServiceName=test_impala"

If so, the faster way is to create the missing service manually. Run these commands (come from setup-ranger() in testdata/bin/create-load-data.sh) in your shell:

RANGER_SETUP_DIR="${IMPALA_HOME}/testdata/cluster/ranger/setup"

perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \ 
  "${RANGER_SETUP_DIR}/impala_group_owner.json.template" > \ 
  "${RANGER_SETUP_DIR}/impala_group_owner.json"

export GROUP_ID_OWNER=$(wget -qO - --auth-no-challenge --user=admin --password=admin \
  --post-file="${RANGER_SETUP_DIR}/impala_group_owner.json" \
  --header="accept:application/json" \
  --header="Content-Type:application/json" \
  http://localhost:6080/service/xusers/secure/groups |
  python -c "import sys, json; print json.load(sys.stdin)['id']")

export GROUP_ID_NON_OWNER=$(wget -qO - --auth-no-challenge --user=admin \
  --password=admin --post-file="${RANGER_SETUP_DIR}/impala_group_non_owner.json" \
  --header="accept:application/json" \
  --header="Content-Type:application/json" \
  http://localhost:6080/service/xusers/secure/groups |
  python -c "import sys, json; print json.load(sys.stdin)['id']")

perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \ 
  "${RANGER_SETUP_DIR}/impala_user_owner.json.template" > \ 
  "${RANGER_SETUP_DIR}/impala_user_owner.json"

perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \ 
  "${RANGER_SETUP_DIR}/impala_user_non_owner.json.template" > \ 
  "${RANGER_SETUP_DIR}/impala_user_non_owner.json"

wget -O /dev/null --auth-no-challenge --user=admin --password=admin \ 
  --post-file="${RANGER_SETUP_DIR}/impala_user_owner.json" \ 
  --header="Content-Type:application/json" \ 
  http://localhost:6080/service/xusers/secure/users

wget -O /dev/null --auth-no-challenge --user=admin --password=admin \ 
  --post-file="${RANGER_SETUP_DIR}/impala_user_non_owner.json" \ 
  --header="Content-Type:application/json" \ 
  http://localhost:6080/service/xusers/secure/users

wget -O /dev/null --auth-no-challenge --user=admin --password=admin \ 
  --post-file="${RANGER_SETUP_DIR}/impala_service.json" \ 
  --header="Content-Type:application/json" \ 
  http://localhost:6080/service/public/v2/api/service

curl -u admin:admin -H "Accept: application/json" -H "Content-Type: application/json" \ 
  -X PUT http://localhost:6080/service/public/v2/api/policy/4 \ 
  -d @"${RANGER_SETUP_DIR}/policy_4_revised.json"

Then you should be able to see the "test_impala" service in your Ranger portal (default to http://localhost:6080) like this

If you encounter errors in executing the wget commands, try restart Ranger by testdata/bin/run-ranger-server.sh. If Ranger fails to start, try reconfigure the ranger db by "bin/create-test-configuration.sh -create_ranger_policy_db".

2.  Failed to start Ranger Admin server due to "Permission denied"

This error usually happens when you first set up the ranger service or after we bump the CDP build number (so you need to set up the ranger service again!). The errors are

find: ‘/home/quanlong/workspace/Impala/toolchain/cdp_components-7049391/ranger-2.1.0.7.2.7.0-44-admin/ews/webapp/WEB-INF/classes/conf/’: No such file or directory
mkdir: cannot create directory ‘/var/run/ranger’: Permission denied
chmod: cannot access '/var/run/ranger': No such file or directory
Restarting Apache Ranger Admin
Apache Ranger Admin Service is not running
Starting Apache Ranger Admin Service
Apache Ranger Admin Service failed to start!

The script should not creating '/var/run/ranger'. It reveals that RANGER_PID_DIR_PATH is not set correctly. It's default value is '/var/run/ranger'.

You should run bin/create-test-configuration.sh to set up the ranger service. It will copy some scripts to the ranger dir so some vars like RANGER_PID_DIR_PATH can be set correctly. After these, testdata/bin/run-ranger-server.sh will succeed.

Remember to run bin/create-test-configuration.sh when you bump the CDP build number. Because a new Ranger dir will be created in toolchain and it should be set up as well.

  • No labels