Intro

Executing statements periodically can be usefull in

Pulling informations from external systems
Periodically updating column statistics
Rebuilding materialized views

Overview

The metastore maintains the scheduled queries in the metastore database
Hiveserver(s) periodically polls the metastore for a scheduled query to be executed

During execution informations about ongoing/finished executions are kept in the metastore

Scheduled queries were added in Hive 4.0 (HIVE-21884)

Hive has it’s scheduled query interface built into the language itself for easy access:

Maintaining scheduled queries

Create Scheduled query syntax

CREATE SCHEDULED QUERY <scheduled_query_name>
<scheduleSpecification>
     [<executedAsSpec> ]
     [<enableSpecification>]
     <definedAsSpec>

Alter Scheduled query syntax

ALTER SCHEDULED QUERY <scheduled_query_name>
(<scheduleSpec>|<executedAsSpec>|<enableSpecification>|<definedAsSpec>);

Drop syntax

DROP SCHEDULED QUERY <scheduled_query_name>;

scheduleSpecification syntax

Schedules can be specified using CRON expressions or for common cases there is a simpler form; in any case the schedule is stored as Quartz cron expression.

CRON based schedule syntax

CRON <quartz_schedule_expression>

where quartz_schedule_expression is quoted schedule in the Quartz format

https://www.freeformatter.com/cron-expression-generator-quartz.html

For example the CRON '0 */10 * * * ? *' expression will fire every 10 minutes.

EVERY based schedule syntax

To give a more readable way to declare schedules EVERY can be used.

EVERY [<integer>] (SECOND|MINUTE|HOUR) [(OFFSET BY|AT) <timeOrDate>]

the format makes it possible to declare schedules in a more readable way:

EVERY 2 MINUTES
EVERY HOUR AT '0:07:30'
EVERY DAY AT '11:35:30'

ExecutedAs syntax

EXECUTED AS <user_name>

Scheduled queries are executed as the declaring user by default; but people with admin privileges might be able to change the executing user.

enableSpecification syntax

(ENABLE[D] | DISABLE[D])

Can be used to enable/disable a schedule.

For CREATE SCHEDULED QUERY statements the default behaviour is set by the configuration key hive.scheduled.queries.create.as.enabled

Defined AS syntax

[DEFINED] AS <hiveQuery>

The “query” is a single statement expression to be scheduled for execution.

System tables/views

Informations about scheduled queries/executions can be obtain by using the information_schema or the sysdb - recommended way is to use the information_schema; sysdb is tables are there to build the information_schema level views - and for debugging.

information_schema.scheduled_queries

Suppose we have a scheduled query defined by:

create scheduled query sc1 cron '0 */10 * * * ? *' as select 1;

Let’s take a look at it in the information_schema.scheduled_queries table by using

select * from information_schema.scheduled_queries;

I will transpose the resultset to describe each column

scheduled_query_id	1	Internally, every scheduled query also has a numeric id
schedule_name	sc1	The name of the schedule
enabled	true	True if the schedule is enabled
cluster_namespace	hive	The namespace thes scheduled query belongs to
schedule	0 /10 * * ? *	The schedule described in QUARTZ cron format
user	dev	The owner/executor of the query
query	select 1	The query being scheduled
next_execution	2020-01-29 16:50:00	Technical column; shows when the next execution should happen

(schedule_name,cluster_namespace) is unique

information_schema.scheduled_executions

This view can be used to get information about recent scheduled query executions.

select * from information_schema.scheduled_executions;

One record in this view has the following informations:

scheduled_execution_id	13	Every scheduled query execution has a unique numeric id
schedule_name	sc1	The schedule name to which this execution belongs
executor_query_id	dev_20200131103008_c9a39b8d-e26b-44cd-b8ae-9d054204dc07	The query id assigned by the execution engine for the given scheduled execution
state	FINISHED	State of the execution; can be
start_time	2020-01-31 10:30:06	Start time of execution
end_time	2020-01-31 10:30:08	End time of execution
elapsed	2	(computed) end_time-start_time
error_message	NULL	In case the query is FAILED the error message is shown here
last_update_time	NULL	During execution the last update time the executor provided informations about the state

Execution states

INITED	The scheduled execution record is created at the time an executor is assigned to run it; The INITED state is retained until the first update from the executor comes in.
EXECUTING	Queries in executing state are being processed by the executor; during this phase the executor reports the progress of the query in intervals defined by: hive.scheduled.queries.executor.progress.report.interval
FAILED	The query execution stoped by an error code(or an exception) when this state is set the error_message is also filled.
FINISHED	The query finished without problems
TIMED_OUT	An execution is considered timed out when it’s being executed for more than metastore.scheduled.queries.execution.timeout. The scheduled queries maintenance task checks for any timed out executions.

How long are execution informations are retained?

The scheduled query maintenance task removes older than metastore.scheduled.queries.execution.max.age entries.

Configuration

Hive metastore related configuration

metastore.scheduled.queries.enabled (default: true)
Controls the metastore side support for scheduled queries; forces all HMS scheduled query related endpoints to return with an error
metastore.scheduled.queries.execution.timeout (default: 2 minutes)
In case a scheduled execution is not updated for at least this amount of time; it’s state will be changed to TIMED_OUT by the cleaner task
metastore.scheduled.queries.execution.maint.task.frequency (default: 1 minute)
Interval of scheduled query maintenance task. Which removes executions above max age; and marks executions as TIMED_OUT if the condition is met
metastore.scheduled.queries.execution.max.age (default: 30 days)
Maximal age of a scheduled query execution entry before it is removed.

HiveServer2 related configuration

hive.scheduled.queries.executor.enabled (default: true)
Controls whether HS2 will run scheduled query executor.
hive.scheduled.queries.namespace (default: "hive")
Sets the scheduled query namespace to be used. New scheduled queries are created in this namespace; and execution is also bound to the namespace
hive.scheduled.queries.executor.idle.sleep.time (default: 1 minute)
Time to sleep between querying for the presence of a scheduled query.
hive.scheduled.queries.executor.progress.report.interval (default: 1 minute)
While scheduled queries are in flight; a background update happens periodically to report the actual state of the query.
hive.scheduled.queries.create.as.enabled (default: true)
This option sets the default behaviour of newly created scheduled queries.
hive.security.authorization.scheduled.queries.supported (default: false)
Enable this if the configured authorizer is able to handle scheduled query related calls.

Examples

create table t (a integer);

create scheduled query sc1 cron ‘0 */10 * * * ? *’ as insert into t values (1);

Example 1 - general usage of schedules

create table t (a integer);

-- create scheduled queries

create scheduled query sc1 cron ‘0 */10 * * * ? *’ as insert into t values (1);

-- depending on hive.scheduled.queries.create.as.enabled the query might get create in disabled mode

-- it can be enabled using:

alter scheduled query sc1 enabled;

-- inspect scheduled queries using the information_schema

create scheduled query sc2 cron ‘0 */10 * * * ? *’ as analyze table t compute statistics for columns;

Space shortcuts

Child pages

Maintaining scheduled queries

Create Scheduled query syntax

Alter Scheduled query syntax

Drop syntax

scheduleSpecification syntax

CRON based schedule syntax

EVERY based schedule syntax

ExecutedAs syntax

enableSpecification syntax

Defined AS syntax

System tables/views

information_schema.scheduled_queries

information_schema.scheduled_executions

Execution states

Configuration

Hive metastore related configuration

HiveServer2 related configuration

Examples

Space shortcuts

Child pages

Scheduled Queries

Maintaining scheduled queries

Create Scheduled query syntax

Alter Scheduled query syntax

Drop syntax

scheduleSpecification syntax

CRON based schedule syntax

EVERY based schedule syntax

ExecutedAs syntax

enableSpecification syntax

Defined AS syntax

System tables/views

information_schema.scheduled_queries

information_schema.scheduled_executions

Execution states

Configuration

Hive metastore related configuration

HiveServer2 related configuration

Examples