When starting an Impala minicluster on a local machine using '$IMPALA_HOME/bin/start-impala-cluster.py', by default there is no authorization service, e.g., Sentry or Ranger, enabled on this cluster. We could enable the Ranger service on an Impala minicluster by the following steps.
- Execute '$IMPALA_HOME/testdata/bin/kill-all.sh'
- Execute '$IMPALA_HOME/bin/impala-config.sh'
- Execute '$IMPALA_HOME/buildall.sh -noclean -notests -ninja'
- Execute '$IMPALA_HOME/bin/create-test-configuration.sh -create_ranger_policy_db'
- Execute '$IMPALA_HOME/testdata/bin/run-all.sh'
- Execute '$IMPALA_HOME/testdata/bin/create-load-data.sh'
- Execute the following command. We note that the arguments passed into 'start-impala-cluster.py' could also be found at '$IMPALA_HOME/tests/authorization/test_ranger.py'. The first 4 arguments are passed to 'start-impala-cluster.py' directly and those surrounded by a pair of single quotation marks are passed to statestore, impalad, and catalogd, respectively.
$IMPALA_HOME/bin/start-impala-cluster.py \
--cluster_size=3 \
--num_coordinators=3 \
--log_dir=/tmp/ \
--log_level=1 \
'--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50' \
'--impalad_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger' \
'--catalogd_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger'
Once the minicluster has been started, we could log into the Impala shell as an administrator with username 'admin' by executing '$IMPALA_HOME/bin/impala-shell -u admin'. The account of 'admin' was added by the function 'setup-ranger' in 'create-load-data.sh' above. To tell whether or not the Impala is Ranger-enabled, we could try to execute 'refresh authorization' in the Impala shell. If the Ranger is enabled in Impala, we would see some output similar to the following.
[localhost:21000] default> refresh authorization;
Query: refresh authorization
Query submitted at: 2019-08-29 15:17:37 (Coordinator: http://fangyurao-OptiPlex-9020:25000)
Query progress can be monitored at: http://fangyurao-OptiPlex-9020:25000/query_plan?query_id=374567f0bf4ca48b:0769d23700000000
Fetched 0 row(s) in 0.02s
If the Ranger service is not correctly enabled, then after executing '$IMPALA_HOME/bin/impala-shell -u admin' followed by 'refresh authorization' in the Impala shell, we may see the following error message.
[localhost:21000] default> refresh authorization;
Query: refresh authorization
Query submitted at: 2019-08-29 15:23:37 (Coordinator: http://fangyurao-OptiPlex-9020:25000)
ERROR: AnalysisException: Authorization is not enabled. To enable authorization restart Impala with the --server_name=<name> flag.
We also note that currently to create 'admin' in the Ranger service before this account is created, we have to run '$IMPALA_HOME/testdata/bin/create-load-data.sh' (the 5th step above to start a Ranger-enabled Impala minicluster), which does much more than needed because this script will also load the whole test datasets, which is time-consuming. A better approach is thus to only call the function 'setup-ranger' in 'create-load-data.sh' only. To achieve this, we may consider moving the function 'setup-ranger' out of 'create-load-data.sh' and then we make 'create-load-data.sh' call 'setup-ranger'.
Troubleshooting
1. Encounter errors like "AuthorizationException: User 'admin' does not have privileges to execute ..." in tests
Range may not be configured correctly. See logs in ${IMPALA_HOME}/logs/cluster/ranger/ranger-admin-${HOSTNAME}-${USER}.log, there may be errors like
2019-10-10 02:29:41,007 [http-bio-6080-exec-2] ERROR org.apache.ranger.common.ServiceUtil (ServiceUtil.java:1359) - Requested Service not found. serviceName=test_impala 2019-10-10 02:29:41,008 [http-bio-6080-exec-2] INFO org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:345) - Request failed. loginId=null, logMessage="RANGER_ERROR_SERVICE_NOT_FOUND: ServiceName=test_impala"
If so, the faster way is to create the missing service manually. Run these commands (come from setup-ranger() in testdata/bin/create-load-data.sh) in your shell:
RANGER_SETUP_DIR="${IMPALA_HOME}/testdata/cluster/ranger/setup" perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \ "${RANGER_SETUP_DIR}/impala_group.json.template" > \ "${RANGER_SETUP_DIR}/impala_group.json" export GROUP_ID=$(wget -qO - --auth-no-challenge --user=admin --password=admin \ --post-file="${RANGER_SETUP_DIR}/impala_group.json" \ --header="accept:application/json" \ --header="Content-Type:application/json" \ http://localhost:6080/service/xusers/secure/groups | python -c "import sys, json; print json.load(sys.stdin)['id']") perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \ "${RANGER_SETUP_DIR}/impala_user.json.template" > \ "${RANGER_SETUP_DIR}/impala_user.json" wget -O /dev/null --auth-no-challenge --user=admin --password=admin \ --post-file="${RANGER_SETUP_DIR}/impala_user.json" \ --header="Content-Type:application/json" \ http://localhost:6080/service/xusers/secure/users wget -O /dev/null --auth-no-challenge --user=admin --password=admin \ --post-file="${RANGER_SETUP_DIR}/impala_service.json" \ --header="Content-Type:application/json" \ http://localhost:6080/service/public/v2/api/service
Then you should be able to see the "test_impala" service in your Ranger portal (default to http://localhost:6080) like this
If you encounter errors in executing the wget commands, try restart Ranger by testdata/bin/run-ranger-server.sh. If Ranger fails to start, try reconfigure the ranger db by "bin/create-test-configuration.sh -create_ranger_policy_db".