Apache Kylin : Analytical Data Warehouse for Big Data
Page History
...
Table of Contents | ||
---|---|---|
|
Background
Since that fact that Kylin4 is highly depend on Spark SQL, it better we have a deeper understanding of Spark SQL.
Definitation
Catalyst is an execution-agnostic framework to represent and manipulate a dataflow graph, i.e. trees of relational operators and expressions.
The main abstraction in Catalyst is TreeNode that is then used to build trees of Expressions or QueryPlans.
Core Contract
Name | Role | Comment |
---|---|---|
SparkSession | Entry Point to Spark SQL |
As a Spark developer, you create a |
Dataset | Structured Query with Data Encoder | Dataset is a strongly-typed data structure in Spark SQL that represents a structured query. |
Catalyst | Tree Manipulation Framework | Catalyst is an execution-agnostic framework to represent and manipulate a dataflow graph, i.e. trees of relational operators and expressions. |
TreeNode | Node in Catalyst Tree |
|
Expression | Executable Node in Catalyst Tree |
|
QueryPlan | Structured Query Plan |
Scala-specific, |
Catalog | Metastore Management Interface |
|
Attribute | Base of leaf named expressions |
|
Core Diagram
Framework UML Diagram
PlantUML |
---|
abstract class TreeNode << BASIC >> { // TreeNode is a recursive data structure that can have one or many children that are again TreeNodes. -children : Seq[BaseType] -verboseString: String } abstract class Expression { // only required methods that have no implementation + dataType: DataType + doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode + eval(input: InternalRow = EmptyRow): Any + nullable: Boolean } abstract class QueryPlan { def output: Seq[Attribute] def validConstraints: Set[Expression] } trait NamedExpression extends Expression { /** We should never fold named expressions in order to not remove the alias. */ override def foldable: Boolean = false def name: String def exprId: ExprId /** * Returns a dot separated fully qualified name for this attribute. Given that there can be * multiple qualifiers, it is possible that there are other possible way to refer to this * attribute. */ def qualifiedName: String = (qualifier :+ name).mkString(".") /** * Optional qualifier for the expression. * Qualifier can also contain the fully qualified information, for e.g, Sequence of string * containing the database and the table name * * For now, since we do not allow using original table name to qualify a column name once the * table is aliased, this can only be: * * 1. Empty Seq: when an attribute doesn't have a qualifier, * e.g. top level attributes aliased in the SELECT clause, or column from a LocalRelation. * 2. Seq with a Single element: either the table name or the alias name of the table. * 3. Seq with 2 elements: database name and table name */ def qualifier: Seq[String] def toAttribute: Attribute /** Returns the metadata when an expression is a reference to another expression with metadata. */ def metadata: Metadata = Metadata.empty /** Returns a copy of this expression with a new `exprId`. */ def newInstance(): NamedExpression protected def typeSuffix = if (resolved) { dataType match { case LongType => "L" case _ => "" } } else { "" } } TreeNode <|-- Expression TreeNode <|-- QueryPlan |
Credit
All right reserved to jaceklaskowski.
...