Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

Collect and dump the following three source info

If contains COUNT_DISTINCT measure(Boolean)

Resource paths(Array) we can using ResourceDetectUtils to Get source table infor(like source size, etc).

Table RDD leaf task numbers(Map). It's used for the next step -- Adaptively adjust spark parameters

Adaptively adjust spark parameters

Turned on by default

Cluster mode only

Affect spark configuration property

Code Block

language	text

kylin.engine.spark-conf.spark.executor.instances
kylin.engine.spark-conf.spark.executor.cores
kylin.engine.spark-conf.spark.executor.memory
kylin.engine.spark-conf.spark.executor.memoryOverhead
kylin.engine.spark-conf.spark.sql.shuffle.partitions
kylin.engine.spark-conf.spark.driver.memory
kylin.engine.spark-conf.spark.driver.memoryOverhead
kylin.engine.spark-conf.spark.driver.cores

Driver memory base is 1024M, it will adujst by the number of cuboids. The adjust strategy is define in KylinConfigBase.java

Cubing Step : Build by layer

Reduced build steps
- From ten-twenty steps to only two steps
Build Engine
- Simple and clear architecture
- Spark as the only build engine
- All builds are done via spark
- Adaptively adjust spark parameters
- Dictionary of dimensions no longer needed
- Supported measures
  - Sum
  - Count
  - Min
  - Max
  - TopN
  - CountDictinct(Bitmap, HyperLogLog)

Cubiod Storage

The flowing is the tree of parquet storage dictory in FS. As we can see, cuboids are saved into path spliced by Cube Name, Segment Name and Cuboid Id, which is processed by PathManager.java .

Parquet file schema

If there is a dimension combination of columns[id, name, price] and measures[COUNT, SUM], then a parquet file will be generated:

Columns[id, name, age] correspond to Dimension[2, 1, 0], measures[COUNT, SUM] correspond to [3, 4]

Part III . Reference

...

Code Design Diagram

...

Space shortcuts

Page tree

Versions Compared

Old Version 13

New Version 14

Key

Collect and dump the following three source info

Adaptively adjust spark parameters

Cubing Step : Build by layer

Cubiod Storage

Parquet file schema

Part III . Reference

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 13

New Version 14

Key

Collect and dump the following three source info

Adaptively adjust spark parameters

Cubing Step : Build by layer

Cubiod Storage

Parquet file schema

Part III . Reference