You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 25 Next »

TL;DR

If all services in the minicluster are set up correctly and are up, run this to restart the Impala cluster with Ranger authorization:

bin/start-impala-cluster.py --impalad_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger" --catalogd_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger"

If you want to investigate Hive behaviors in Ranger authorization, run this to restart Hive:

testdata/bin/run-hive-server.sh -with_ranger

Details

When starting an Impala minicluster on a local machine using '$IMPALA_HOME/bin/start-impala-cluster.py', by default there is no authorization service, e.g., Sentry or Ranger, enabled on this cluster. We could enable the Ranger service initialized with the default policies on an Impala minicluster by the following steps (assuming that we have already executed '$IMPALA_HOME/testdata/bin/run-all.sh').

  1. Execute '$IMPALA_HOME/testdata/bin/kill-ranger-server.sh'
  2. Execute 'source $IMPALA_HOME/bin/impala-config.sh'
  3. Execute '$IMPALA_HOME/buildall.sh -noclean -notests -ninja'
  4. Execute '$IMPALA_HOME/bin/create-test-configuration.sh -create_ranger_policy_db'
  5. Execute '$IMPALA_HOME/testdata/bin/run-ranger-server.sh'
  6. Execute '$IMPALA_HOME/testdata/bin/create-load-data.sh'
  7. Execute the following command. We note that the arguments passed into 'start-impala-cluster.py' could also be found at '$IMPALA_HOME/tests/authorization/test_ranger.py'. The first 4 arguments are passed to 'start-impala-cluster.py' directly and those surrounded by a pair of single quotation marks are passed to statestore, impalad, and catalogd, respectively.

$IMPALA_HOME/bin/start-impala-cluster.py \
--cluster_size=3 \
--num_coordinators=3 \
--log_dir=/tmp/ \
--log_level=1 \
'--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50' \
'--impalad_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger' \
'--catalogd_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger'

Once the minicluster has been started, we could log into the Impala shell as an administrator with username 'admin' by executing '$IMPALA_HOME/bin/impala-shell -u admin'. The account of 'admin' was added by the function 'setup-ranger' in 'create-load-data.sh' above. To tell whether or not the Impala is Ranger-enabled, we could try to execute 'refresh authorization' in the Impala shell. If the Ranger is enabled in Impala, we would see some output similar to the following.

[localhost:21000] default> refresh authorization;
Query: refresh authorization
Query submitted at: 2019-08-29 15:17:37 (Coordinator: http://fangyurao-OptiPlex-9020:25000)
Query progress can be monitored at: http://fangyurao-OptiPlex-9020:25000/query_plan?query_id=374567f0bf4ca48b:0769d23700000000
Fetched 0 row(s) in 0.02s

If the Ranger service is not correctly enabled, then after executing '$IMPALA_HOME/bin/impala-shell -u admin' followed by 'refresh authorization' in the Impala shell, we may see the following error message.

[localhost:21000] default> refresh authorization;
Query: refresh authorization
Query submitted at: 2019-08-29 15:23:37 (Coordinator: http://fangyurao-OptiPlex-9020:25000)
ERROR: AnalysisException: Authorization is not enabled. To enable authorization restart Impala with the --server_name=<name> flag.

We also note that currently to create 'admin' in the Ranger service before this account is created, we have to run '$IMPALA_HOME/testdata/bin/create-load-data.sh' (the 5th step above to start a Ranger-enabled Impala minicluster), which does much more than needed because this script will also load the whole test datasets, which is time-consuming. A better approach is thus to only call the function 'setup-ranger' in 'create-load-data.sh' only. To achieve this, we may consider moving the function 'setup-ranger' out of 'create-load-data.sh' and then we make 'create-load-data.sh' call 'setup-ranger'.

Troubleshooting

1.  Encounter errors like "AuthorizationException: User 'admin' does not have privileges to execute ..." in tests

Range may not be configured correctly. See logs in ${IMPALA_HOME}/logs/cluster/ranger/ranger-admin-${HOSTNAME}-${USER}.log, there may be errors like

2019-10-10 02:29:41,007 [http-bio-6080-exec-2] ERROR org.apache.ranger.common.ServiceUtil (ServiceUtil.java:1359) - Requested Service not found. serviceName=test_impala
2019-10-10 02:29:41,008 [http-bio-6080-exec-2] INFO  org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:345) - Request failed. loginId=null, logMessage="RANGER_ERROR_SERVICE_NOT_FOUND: ServiceName=test_impala"

If so, the faster way is to create the missing service manually. Run these commands (come from setup-ranger() in testdata/bin/create-load-data.sh) in your shell:

RANGER_SETUP_DIR="${IMPALA_HOME}/testdata/cluster/ranger/setup"

perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \ 
  "${RANGER_SETUP_DIR}/impala_group_owner.json.template" > \ 
  "${RANGER_SETUP_DIR}/impala_group_owner.json"

export GROUP_ID_OWNER=$(wget -qO - --auth-no-challenge --user=admin --password=admin \
  --post-file="${RANGER_SETUP_DIR}/impala_group_owner.json" \
  --header="accept:application/json" \
  --header="Content-Type:application/json" \
  http://localhost:6080/service/xusers/secure/groups |
  python -c "import sys, json; print json.load(sys.stdin)['id']")

export GROUP_ID_NON_OWNER=$(wget -qO - --auth-no-challenge --user=admin \
  --password=admin --post-file="${RANGER_SETUP_DIR}/impala_group_non_owner.json" \
  --header="accept:application/json" \
  --header="Content-Type:application/json" \
  http://localhost:6080/service/xusers/secure/groups |
  python -c "import sys, json; print json.load(sys.stdin)['id']")

perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \ 
  "${RANGER_SETUP_DIR}/impala_user_owner.json.template" > \ 
  "${RANGER_SETUP_DIR}/impala_user_owner.json"

perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \ 
  "${RANGER_SETUP_DIR}/impala_user_non_owner.json.template" > \ 
  "${RANGER_SETUP_DIR}/impala_user_non_owner.json"

wget -O /dev/null --auth-no-challenge --user=admin --password=admin \ 
  --post-file="${RANGER_SETUP_DIR}/impala_user_owner.json" \ 
  --header="Content-Type:application/json" \ 
  http://localhost:6080/service/xusers/secure/users

wget -O /dev/null --auth-no-challenge --user=admin --password=admin \ 
  --post-file="${RANGER_SETUP_DIR}/impala_user_non_owner.json" \ 
  --header="Content-Type:application/json" \ 
  http://localhost:6080/service/xusers/secure/users

wget -O /dev/null --auth-no-challenge --user=admin --password=admin \ 
  --post-file="${RANGER_SETUP_DIR}/impala_service.json" \ 
  --header="Content-Type:application/json" \ 
  http://localhost:6080/service/public/v2/api/service

curl -u admin:admin -H "Accept: application/json" -H "Content-Type: application/json" \ 
  -X PUT http://localhost:6080/service/public/v2/api/policy/4 \ 
  -d @"${RANGER_SETUP_DIR}/policy_4_revised.json"

Then you should be able to see the "test_impala" service in your Ranger portal (default to http://localhost:6080) like this

If you encounter errors in executing the wget commands, try restart Ranger by testdata/bin/run-ranger-server.sh. If Ranger fails to start, try reconfigure the ranger db by "bin/create-test-configuration.sh -create_ranger_policy_db".



  • No labels