Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

TL;DR

If all services in the minicluster are set up correctly and are up, run this to restart the Impala cluster:

Code Block
bin/start-impala-cluster.py --impalad_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger" --catalogd_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger"

Details

When starting an Impala minicluster on a local machine using '$IMPALA_HOME/bin/start-impala-cluster.py', by default there is no authorization service, e.g., Sentry or Ranger, enabled on this cluster. We could enable the Ranger service initialized with the default policies on an Impala minicluster by the following steps (assuming that we have already executed '$IMPALA_HOME/testdata/bin/run-all.sh').

  1. Execute '$IMPALA_HOME/testdata/bin/kill-ranger-server.sh'
  2. Execute 'source $IMPALA_HOME/bin/impala-config.sh'
  3. Execute '$IMPALA_HOME/buildall.sh -noclean -notests -ninja'
  4. Execute '$IMPALA_HOME/bin/create-test-configuration.sh -create_ranger_policy_db'
  5. Execute '$IMPALA_HOME/testdata/bin/run-ranger-server.sh'
  6. Execute '$IMPALA_HOME/testdata/bin/create-load-data.sh'
  7. Execute the following command. We note that the arguments passed into 'start-impala-cluster.py' could also be found at '$IMPALA_HOME/tests/authorization/test_ranger.py'. The first 4 arguments are passed to 'start-impala-cluster.py' directly and those surrounded by a pair of single quotation marks are passed to statestore, impalad, and catalogd, respectively.

$IMPALA_HOME/bin/start-impala-cluster.py \
--cluster_size=3 \
--num_coordinators=3 \
--log_dir=/tmp/ \
--log_level=1 \
'--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50' \
'--impalad_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger' \
'--catalogd_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger'

Once the minicluster has been started, we could log into the Impala shell as an administrator with username 'admin' by executing '$IMPALA_HOME/bin/impala-shell -u admin'. The account of 'admin' was added by the function 'setup-ranger' in 'create-load-data.sh' above. To tell whether or not the Impala is Ranger-enabled, we could try to execute 'refresh authorization' in the Impala shell. If the Ranger is enabled in Impala, we would see some output similar to the following.

Code Block
[localhost:21000] default> refresh authorization;
Query: refresh authorization
Query submitted at: 2019-08-29 15:17:37 (Coordinator: http://fangyurao-OptiPlex-9020:25000)
Query progress can be monitored at: http://fangyurao-OptiPlex-9020:25000/query_plan?query_id=374567f0bf4ca48b:0769d23700000000
Fetched 0 row(s) in 0.02s

If the Ranger service is not correctly enabled, then after executing '$IMPALA_HOME/bin/impala-shell -u admin' followed by 'refresh authorization' in the Impala shell, we may see the following error message.

Code Block
[localhost:21000] default> refresh authorization;
Query: refresh authorization
Query submitted at: 2019-08-29 15:23:37 (Coordinator: http://fangyurao-OptiPlex-9020:25000)
ERROR: AnalysisException: Authorization is not enabled. To enable authorization restart Impala with the --server_name=<name> flag.

We also note that currently to create 'admin' in the Ranger service before this account is created, we have to run '$IMPALA_HOME/testdata/bin/create-load-data.sh' (the 5th step above to start a Ranger-enabled Impala minicluster), which does much more than needed because this script will also load the whole test datasets, which is time-consuming. A better approach is thus to only call the function 'setup-ranger' in 'create-load-data.sh' only. To achieve this, we may consider moving the function 'setup-ranger' out of 'create-load-data.sh' and then we make 'create-load-data.sh' call 'setup-ranger'.

Troubleshooting

1.  Encounter errors like "AuthorizationException: User 'admin' does not have privileges to execute ..." in tests

Range may not be configured correctly. See logs in ${IMPALA_HOME}/logs/cluster/ranger/ranger-admin-${HOSTNAME}-${USER}.log, there may be errors like

Code Block
2019-10-10 02:29:41,007 [http-bio-6080-exec-2] ERROR org.apache.ranger.common.ServiceUtil (ServiceUtil.java:1359) - Requested Service not found. serviceName=test_impala
2019-10-10 02:29:41,008 [http-bio-6080-exec-2] INFO  org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:345) - Request failed. loginId=null, logMessage="RANGER_ERROR_SERVICE_NOT_FOUND: ServiceName=test_impala"

If so, the faster way is to create the missing service manually. Run these commands (come from setup-ranger() in testdata/bin/create-load-data.sh) in your shell:

Code Block
languagebash
RANGER_SETUP_DIR="${IMPALA_HOME}/testdata/cluster/ranger/setup"

perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \
  "${RANGER_SETUP_DIR}/impala_group.json.template" > \
  "${RANGER_SETUP_DIR}/impala_group.json"

export GROUP_ID=$(wget -qO - --auth-no-challenge --user=admin --password=admin \
  --post-file="${RANGER_SETUP_DIR}/impala_group.json" \
  --header="accept:application/json" \
  --header="Content-Type:application/json" \
  http://localhost:6080/service/xusers/secure/groups |
  python -c "import sys, json; print json.load(sys.stdin)['id']")

perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \
  "${RANGER_SETUP_DIR}/impala_user.json.template" > \
  "${RANGER_SETUP_DIR}/impala_user.json"

wget -O /dev/null --auth-no-challenge --user=admin --password=admin \
  --post-file="${RANGER_SETUP_DIR}/impala_user.json" \
  --header="Content-Type:application/json" \
  http://localhost:6080/service/xusers/secure/users

wget -O /dev/null --auth-no-challenge --user=admin --password=admin \
  --post-file="${RANGER_SETUP_DIR}/impala_service.json" \
  --header="Content-Type:application/json" \
  http://localhost:6080/service/public/v2/api/service

Then you should be able to see the "test_impala" service in your Ranger portal (default to http://localhost:6080) like this

Image Added

If you encounter errors in executing the wget commands, try restart Ranger by testdata/bin/run-ranger-server.sh. If Ranger fails to start, try reconfigure the ranger db by "bin/create-test-configuration.sh -create_ranger_policy_db".



When starting an Impala minicluster on a local machine using '$IMPALA_HOME/bin/start-impala-cluster.py', by default there is no authorization service, e.g., Sentry or Ranger, enabled on this cluster. We could enable the Ranger service initialized with the default policies on an Impala minicluster by the following steps (assuming that we have already executed '$IMPALA_HOME/testdata/bin/run-all.sh').

...

We also note that currently to create 'admin' in the Ranger service before this account is created, we have to run '$IMPALA_HOME/testdata/bin/create-load-data.sh' (the 5th step above to start a Ranger-enabled Impala minicluster), which does much more than needed because this script will also load the whole test datasets, which is time-consuming. A better approach is thus to only call the function 'setup-ranger' in 'create-load-data.sh' only. To achieve this, we may consider moving the function 'setup-ranger' out of 'create-load-data.sh' and then we make 'create-load-data.sh' call 'setup-ranger'.

Troubleshooting

1.  Encounter errors like "AuthorizationException: User 'admin' does not have privileges to execute ..." in tests

Range may not be configured correctly. See logs in ${IMPALA_HOME}/logs/cluster/ranger/ranger-admin-${HOSTNAME}-${USER}.log, there may be errors like

...