Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Furthermore we can infer overlapping intervals too. Finally, the filters that are not specified on the time dimension will be translated into valid Druid filters and included within the query using the filter property.

Partitioning select queries

We can partition Druid select queries that return large results into multiple subqueries that are executed in parallel against Druid. The parallelization depends on the value for the hive.druid.select.threshold configuration parameter.

In particular, we take the number of rows of the result obtained using a segment metadata query. The number of splits for the select query is : number of rows /  hive.druid.select.threshold splits. We split the query along the time dimension, assuming that the records distribution across time is uniform (we plan to extend this logic in the future). Thus, we consider the time boundaries in the query in order to know how to split the query; if the query is not time bounded, we submit a time boundary query to Druid to obtain them.

Timeseries queries

Timeseries is one of the types of queries that Druid can execute very efficiently. The following SQL query translates directly into a Druid timeseries query:

...