THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Table of Contents
minLevel2

Definitation

Catalyst is an execution-agnostic framework to represent and manipulate a dataflow graph, i.e. trees of relational operators and expressions

The main abstraction in Catalyst is TreeNode that is then used to build trees of Expressions or QueryPlans.


Core Contract

NameRoleComment
SparkSession

DatasetStructured Query with Data Encoder

Dataset is a strongly-typed data structure in Spark SQL that represents a structured query.

CatalystTree Manipulation FrameworkCatalyst is an execution-agnostic framework to represent and manipulate a dataflow graph, i.e. trees of relational operators and expressions.
TreeNodeNode in Catalyst Tree

TreeNode is a recursive data structure that can have one or many children that are again TreeNodes.

ExpressionExecutable Node in Catalyst Tree

Expression is a executable node (in a Catalyst tree) that can evaluate a result value given input values, i.e. can produce a JVM object per InternalRow.

QueryPlanStructured Query Plan

QueryPlan is part of Catalyst to build a tree of relational operators of a structured query.

Scala-specific, QueryPlan is an abstract class that is the base class of LogicalPlan and SparkPlan (for logical and physical plans, respectively).


Image Added

Core Framework  Diagram


PlantUML

Core Contract

PlantUML
' interface Interface

class TreeNode << BASIC >> {
   // TreeNode is a recursive data structure that can have one or many children that are again TreeNodes.
  
  -children : Seq[BaseType]
  -verboseString: String
}

abstract class Expression  {
  // only required methods that have no implementation
  -+ dataType: DataType
  + doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode
  + eval(input: InternalRow = EmptyRow): Any
  + nullable: Boolean
}

'Interface <|.. TreeNode : implementsabstract class QueryPlan  {
  def output: Seq[Attribute]
  def validConstraints: Set[Expression]
}

TreeNode <|-- Expression

...


TreeNode <|-- QueryPlan


Credit

All right reserved to jaceklaskowski.

...