You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

sad

Executing statements periodically can be usefull in

  • Pulling informations from external systems
  • Periodically updating column statistics
  • Rebuilding materialized views


Overview

  • The metastore maintains the scheduled queries in the metastore database
  • The Hiveserver periodically polls the metastore for a scheduled query to be executed
    • During execution informations about ongoing/finished executions are kept in the metastore to help  


Scheduled queries were added in Hive 4.0 (HIVE-21884)



Hive has it’s scheduled query interface built into the language itself for easy access:

Maintaining scheduled queries

Create Scheduled query syntax

CREATE SCHEDULED QUERY <scheduled_query_name>
     <scheduleSpec>
     [<executedAsSpec>]
     [<enableSpecification>]
     <definedAsSpec>

Alter Scheduled query syntax

ALTER SCHEDULED QUERY <scheduled_query_name>
     (<scheduleSpec>|<executedAsSpec>|<enableSpecification>|<definedAsSpec>);


Drop syntax


DROP SCHEDULED QUERY <scheduled_query_name>;


D;

scheduleSpec syntax


CRON quartz_schedule_expression

quartz_schedule_expression

The schedule have to be specified in the Quartz format

https://www.freeformatter.com/cron-expression-generator-quartz.html


executedAsSpec syntax



EXECUTED AS user_name

Scheduled queries are executed as the declaring user by default; but people with admin privileges might be able to change the executing user.

enableSpecification syntax

(ENABLE[D] | DISABLE[D])

Can be used to disable a schedule.

definedAsSpec syntax


[DEFINED] AS query

The “query” is a single statement expression to be scheduled for execution.

System Tables

Informations about scheduled queries/executions can be obtain by using the information_schema or the sysdb - recommended way is to use the information_schema; sysdb is tables are there to build the information_schema level views - and for debugging.

information_schema.scheduled_queries


Suppose we have a scheduled query defined by:

create scheduled query sc1 cron '0 */10 * * * ? *' as select 1;

Let’s take a look at it in the information_schema.scheduled_queries table by using

select * from information_schema.scheduled_queries;

I will transpose the resultset to describe each column

scheduled_query_id

1

Internally, every scheduled query also has a numeric id

schedule_name

sc1

The name of the schedule

enabled

true

True if the schedule is enabled

cluster_namespace

hive

The namespace thes scheduled query belongs to

schedule

0 */10 * * * ? *

The schedule described in QUARTZ cron format

user

dev

The owner/executor of the query

query

select 1

The query being scheduled

next_execution

2020-01-29 16:50:00

Technical column; shows when the next execution should happen


Note: (schedule_name,cluster_namespace) is unique


information_schema.scheduled_executions


select * from information_schema.scheduled_executions;

FIXME




Configuration

Hive metastore related configuration


    • metastore.scheduled.queries.enabled (default: true)
      Controls the metastore side support for scheduled queries; forces all HMS scheduled query related endpoints to return with an error
    • metastore.scheduled.queries.execution.timeout (default: 2 minutes)
      In case a scheduled execution is not updated for at least this amount of time; it’s state will be changed to TIMED_OUT by the cleaner task


  • metastore.scheduled.queries.execution.maint.task.frequency (default: 1 minute)
    Interval of scheduled query maintenance task. Which removes executions above max age; and marks executions as TIMED_OUT if the condition is met
  • metastore.scheduled.queries.execution.max.age (default: 30 days)
    Maximal age of a scheduled query execution entry before it is removed.



HiveServer2 related configuration


  • hive.scheduled.queries.executor.enabled (default: true)
    Controls whether HS2 will run scheduled query executor.
  • hive.scheduled.queries.namespace (default: "hive")
    Sets the scheduled query namespace to be used. New scheduled queries are created in this namespace; and execution is also bound to the namespace
  • hive.scheduled.queries.executor.idle.sleep.time (default: 1 minute)
    Time to sleep between querying for the presence of a scheduled query.
  • hive.scheduled.queries.executor.progress.report.interval (default: 1 minute)
    While scheduled queries are in flight; a background update happens periodically to report the actual state of the query.
  • hive.scheduled.queries.create.as.enabled (default: true)
    This option sets the default behaviour of newly created scheduled queries.
  • hive.security.authorization.scheduled.queries.supported (default: false)
    Enable this if the configured authorizer is able to handle scheduled query related calls.



Example


create table t (a integer);

create scheduled query sc1 cron ‘0 */10 * * * ? *’ as insert into t values (1);




Example 1 - general usage of schedules

create table t (a integer);

-- create scheduled queries

create scheduled query sc1 cron ‘0 */10 * * * ? *’ as insert into t values (1);

-- depending on hive.scheduled.queries.create.as.enabled the query might get create in disabled mode

-- it can be enabled using:

alter scheduled query sc1 enabled;

-- inspect scheduled queries using the information_schema


create scheduled query sc2 cron ‘0 */10 * * * ? *’ as analyze table t compute statistics for columns;



  • No labels