THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Background

Kylin will generate cuboid statistics of segments in a cube during the cube building; Besides, when optimizing and merging cubes, statistics will update too.

Cuboid statistics will help users to know about precise or estimate cuboid statistics of segments, and help users to design the cube.


The tree structure of cuboid statics

Code Block
Statistics of {cube name}[{segement name}]

Total cuboids: {num}
Total precise rows: {num}
Total precise size(MB): {num}
Sampling percentage:  {num}
Mapper overlap ratio: {num}
Mapper number: {num}
Length of dimension {dimension name} is {num}
...

|---- Cuboid 111111111111111111, [precise or est] row: {num}, precise MB: {num}
    |---- Cuboid 000111111111111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}%
        |---- Cuboid 000101111111111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}%
            |---- Cuboid 000101101111111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}%
                ...
        |---- Cuboid 000111110011111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}%
            |---- Cuboid 000101110011111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}%
				...


Summary

In the above tree structure of statistics for segments, CubeStatsReader will try to print out the statistics of cuboids in the segments. And If some segments were not built in a cube but had the already used "HyperLogLog" algorithm to generate estimate statics, then segment statistics will use estimate statistics to show out.

...

Using estimate or precise statistics will with the related prefix "est" or "precise".


How to use

Check the cube whether contains precise statistics

check the cube

Image Added

check the segments which contain "cuboid_statics_rows_bytes" and "cuboid_statics_size_bytes" not be null

 Image Added


Not only building job, but also merging and optimizing cube will update the "cuboid_statics_rows_bytes" and "cuboid_statics_size_bytes" in segments.

Show statistics of segments in a cube

Code Block
bin/kylin.sh org.apache.kylin.engine.mr.common.CubeStatsReader {cube name}
  • If Kyllin4 does not has a base cuboid for a cube, then the base cuboid "1111...111" row will be 0 and the size is 0.0 MB.
  • If a cuboid does not exist, then its children will show the shrink percentage to be "-0.0 %".
  • the command can only check a cube every time.

...