You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

IDIEP-33
Author Vladimir Ozerov Ozerov
Sponsor
Created  13 Mar 2018
StatusDRAFT


Motivation

Current SQL engine has a number of critical limitations:

  1. No data co-location control, i.e. arbitrary data can be returned silently
  2. Limited execution modes: either two-phase execution (default) or "distributed joins" which adds one more phase
  3. Lack of proper planner which will take in count both data distribution and data statistics

This document is concerned only with engine's infrastructure. Query planning and statistics are non goals.

Description

What to take in count:

  • Partition reservation - all necessary partition should be reserved and released properly for all stages
  • Cancellation - it should be possible to cancel any query type
  • Streaming - it should be possible to send many packets to destination nodes in advance because 
  • Reactivity - all query stages should operate in non-blocking fashing, no waits should occur inside query pool threads
  • Manageability - it should be possible to get clear picture of query plan and influence some stages with query hints
  • Pipeline-agnostic - design should allow us to have both classical ("volcano") and code-generated pipelines in future
  • Result-agnostic - it should be possible to work with both heap, offheap and out-process intermediate results
  • Abstract out source nature - for future extensibility (e.g. "EXTERNAL TABLE", foreign data wrappers)

"Source" - fundamental concept. Types:

  • Local - getting local node's data
  • Ephemeral - any kind of temporal data retrieved from anywhere. This might be: intermediate query results, results of non-correlated subquery, results of REPLICATED LEFT JOIN PARTITIONED)


Message model
class QueryRequest  {
    QueryId queryId;                          // Global query ID
	Collection<UUID> cacheDescriptorIds;      // IDs of the caches being queried
	Collection<UUID> nodeIds;                 // IDs of all nodes participating in the query
	AffinityTopologyVersion topVer;           // Topology version
	SqlSchemaVersion schemaVer;               // Expected schema version
	MvccSnapshot mvccSnapshot;                // MVCC snapshot
}

class QueryFragment {
    QueryId queryId;                          // Global query ID
    FragmentId fragmentId;                    // Unique fragment ID within a query
	FragmentId sinkId;                        // Who will consume results of this fragment
	String sql;                               // SQL to be executed
	Map<UUID, QueryFragmentPartitions> part;  // Involved partitions for every real cache participating in query
}

class QueryFragmentPartitions {
    Map<UUID, Integer> partCnts;              // Number of expected partitions to be reserved on nodes
	Map<UUID, Collection<Integer>> parts;     // Explicit partitions
}

class QueryBatch {
    QueryId queryId;                          // Global query ID
	FragmentId sourceId;                      // Source ID (producer)
	FragmentId sinkId;                        // Sink ID (consumer)	
	UUID sourceNodeId;                        // Source node ID
    int sourceParallelism;                    // Parallelism of a fragment on a given node
	long batchId;                             // ID of the batch for the given source/sink pair (used for congestion control)
	List<Row> rows;                           // Data rows
	boolean last;                             // Whether this is the last batch
}

class QueryBatchAck {
    QueryId queryId;                          // Global query ID 
    FragmentId sourceId                       // Source ID
	int lastBatchId;                          // ID of the last acked batch
}


Risks and Assumptions

// TODO: Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

Discussion Links

// TODO: Links to discussions on the devlist, if applicable.

Reference Links

// TODO: Links to various reference documents, if applicable.

Tickets

// TODO: Links or report with relevant JIRA tickets.

  • No labels