TL;DR
If all services in the minicluster are set up correctly and are up, run this to restart the Impala cluster with Ranger authorization:
bin/start-impala-cluster.py --impalad_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger" --catalogd_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger"
If you want to investigate Hive behaviors in Ranger authorization, run this to restart Hive:
testdata/bin/run-hive-server.sh -with_ranger
Details
When starting an Impala minicluster on a local machine using '$IMPALA_HOME/bin/start-impala-cluster.py', by default there is no authorization service, e.g., Sentry or Ranger, enabled on this cluster. We could enable the Ranger service initialized with the default policies on an Impala minicluster by the following steps (assuming that we have already executed '$IMPALA_HOME/testdata/bin/run-all.sh').
- Execute '$IMPALA_HOME/testdata/bin/kill-ranger-server.sh'
- Execute 'source $IMPALA_HOME/bin/impala-config.sh'
- Execute '$IMPALA_HOME/buildall.sh -noclean -notests -ninja'
- Execute '$IMPALA_HOME/bin/create-test-configuration.sh -create_ranger_policy_db'
- Execute '$IMPALA_HOME/testdata/bin/run-ranger-server.sh'
- Execute '$IMPALA_HOME/testdata/bin/create-load-data.sh'
- Execute the following command. We note that the arguments passed into 'start-impala-cluster.py' could also be found at '$IMPALA_HOME/tests/authorization/test_ranger.py'. The first 4 arguments are passed to 'start-impala-cluster.py' directly and those surrounded by a pair of single quotation marks are passed to statestore, impalad, and catalogd, respectively.
$IMPALA_HOME/bin/start-impala-cluster.py \
--cluster_size=3 \
--num_coordinators=3 \
--log_dir=/tmp/ \
--log_level=1 \
'--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50' \
'--impalad_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger' \
'--catalogd_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger'
Once the minicluster has been started, we could log into the Impala shell as an administrator with username 'admin' by executing '$IMPALA_HOME/bin/impala-shell -u admin'. The account of 'admin' was added by the function 'setup-ranger' in 'create-load-data.sh' above. To tell whether or not the Impala is Ranger-enabled, we could try to execute 'refresh authorization' in the Impala shell. If the Ranger is enabled in Impala, we would see some output similar to the following.
[localhost:21000] default> refresh authorization; Query: refresh authorization Query submitted at: 2019-08-29 15:17:37 (Coordinator: http://fangyurao-OptiPlex-9020:25000) Query progress can be monitored at: http://fangyurao-OptiPlex-9020:25000/query_plan?query_id=374567f0bf4ca48b:0769d23700000000 Fetched 0 row(s) in 0.02s
If the Ranger service is not correctly enabled, then after executing '$IMPALA_HOME/bin/impala-shell -u admin' followed by 'refresh authorization' in the Impala shell, we may see the following error message.
[localhost:21000] default> refresh authorization; Query: refresh authorization Query submitted at: 2019-08-29 15:23:37 (Coordinator: http://fangyurao-OptiPlex-9020:25000) ERROR: AnalysisException: Authorization is not enabled. To enable authorization restart Impala with the --server_name=<name> flag.
We also note that currently to create 'admin' in the Ranger service before this account is created, we have to run '$IMPALA_HOME/testdata/bin/create-load-data.sh' (the 5th step above to start a Ranger-enabled Impala minicluster), which does much more than needed because this script will also load the whole test datasets, which is time-consuming. A better approach is thus to only call the function 'setup-ranger' in 'create-load-data.sh' only. To achieve this, we may consider moving the function 'setup-ranger' out of 'create-load-data.sh' and then we make 'create-load-data.sh' call 'setup-ranger'.
Troubleshooting
1. Encounter errors like "AuthorizationException: User 'admin' does not have privileges to execute ..." in tests
Range may not be configured correctly. See logs in ${IMPALA_HOME}/logs/cluster/ranger/ranger-admin-${HOSTNAME}-${USER}.log, there may be errors like
2019-10-10 02:29:41,007 [http-bio-6080-exec-2] ERROR org.apache.ranger.common.ServiceUtil (ServiceUtil.java:1359) - Requested Service not found. serviceName=test_impala 2019-10-10 02:29:41,008 [http-bio-6080-exec-2] INFO org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:345) - Request failed. loginId=null, logMessage="RANGER_ERROR_SERVICE_NOT_FOUND: ServiceName=test_impala"
If so, the faster way is to create the missing service manually. Run these commands (come from setup-ranger() in testdata/bin/create-load-data.sh) in your shell:
RANGER_SETUP_DIR="${IMPALA_HOME}/testdata/cluster/ranger/setup" perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \ "${RANGER_SETUP_DIR}/impala_group_owner.json.template" > \ "${RANGER_SETUP_DIR}/impala_group_owner.json" export GROUP_ID_OWNER=$(wget -qO - --auth-no-challenge --user=admin --password=admin \ --post-file="${RANGER_SETUP_DIR}/impala_group_owner.json" \ --header="accept:application/json" \ --header="Content-Type:application/json" \ http://localhost:6080/service/xusers/secure/groups | python -c "import sys, json; print json.load(sys.stdin)['id']") export GROUP_ID_NON_OWNER=$(wget -qO - --auth-no-challenge --user=admin \ --password=admin --post-file="${RANGER_SETUP_DIR}/impala_group_non_owner.json" \ --header="accept:application/json" \ --header="Content-Type:application/json" \ http://localhost:6080/service/xusers/secure/groups | python -c "import sys, json; print json.load(sys.stdin)['id']") perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \ "${RANGER_SETUP_DIR}/impala_user_owner.json.template" > \ "${RANGER_SETUP_DIR}/impala_user_owner.json" perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \ "${RANGER_SETUP_DIR}/impala_user_non_owner.json.template" > \ "${RANGER_SETUP_DIR}/impala_user_non_owner.json" wget -O /dev/null --auth-no-challenge --user=admin --password=admin \ --post-file="${RANGER_SETUP_DIR}/impala_user_owner.json" \ --header="Content-Type:application/json" \ http://localhost:6080/service/xusers/secure/users wget -O /dev/null --auth-no-challenge --user=admin --password=admin \ --post-file="${RANGER_SETUP_DIR}/impala_user_non_owner.json" \ --header="Content-Type:application/json" \ http://localhost:6080/service/xusers/secure/users wget -O /dev/null --auth-no-challenge --user=admin --password=admin \ --post-file="${RANGER_SETUP_DIR}/impala_service.json" \ --header="Content-Type:application/json" \ http://localhost:6080/service/public/v2/api/service curl -u admin:admin -H "Accept: application/json" -H "Content-Type: application/json" \ -X PUT http://localhost:6080/service/public/v2/api/policy/4 \ -d @"${RANGER_SETUP_DIR}/policy_4_revised.json"
Then you should be able to see the "test_impala" service in your Ranger portal (default to http://localhost:6080) like this
If you encounter errors in executing the wget commands, try restart Ranger by testdata/bin/run-ranger-server.sh. If Ranger fails to start, try reconfigure the ranger db by "bin/create-test-configuration.sh -create_ranger_policy_db".