THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel4
indent20px
stylecircle

Part I . Why Kylin on Parquet

...

          Metadata still can be saved into HBase, JDBC. There's a little difference with kylin metadata, see more from MetadataConverter.scala

  • Storage

          Cuboids are saved into HDFS as parquet format(or other file system, no longer need HBase)

...

Part II . How Kylin on Parquet

...

Cubing Step : Resources detect

Collect and dump the following three source info

If contains COUNT_DISTINCT measure(Boolean)

...

Table RDD leaf task numbers(Map). It's used for the next step -- Adaptively adjust spark parameters

Adaptively adjust spark parameters

Turned on by default

Cluster mode only

...

Driver memory base is 1024M, it will adujst by the number of cuboids. The adjust strategy is define in KylinConfigBase.java

...

Cubing Step : Build by layer

  • Reduced build steps
    • From ten-twenty steps to only two steps
  • Build Engine
    • Simple and clear architecture
    • Spark as the only build engine
    • All builds are done via spark
    • Adaptively adjust spark parameters
    • Dictionary of dimensions no longer needed
    • Supported measures
      • Sum
      • Count
      • Min
      • Max
      • TopN
      • CountDictinct(Bitmap, HyperLogLog)

Cubiod Storage

The flowing is the tree of parquet storage dictory in FS. As we can see, cuboids are saved into path spliced by Cube Name, Segment Name and Cuboid Id, which is processed by PathManager.java .

...