Apache Kylin : Analytical Data Warehouse for Big Data
Page History
Background
Kylin will generate cuboid statistics of segments in a cube during the cube building; Besides, when optimizing and merging cubes, statistics will update too.
Cuboid statistics will help users to know about precise or estimate cuboid statistics of segments, and help users to design the cube.
The tree structure of cuboid statics
Code Block |
---|
Statistics of {cube name}[{segement name}] Total cuboids: {num} Total precise rows: {num} Total precise size(MB): {num} Sampling percentage: {num} Mapper overlap ratio: {num} Mapper number: {num} Length of dimension {dimension name} is {num} ... |---- Cuboid 111111111111111111, [precise or est] row: {num}, precise MB: {num} |---- Cuboid 000111111111111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}% |---- Cuboid 000101111111111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}% |---- Cuboid 000101101111111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}% ... |---- Cuboid 000111110011111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}% |---- Cuboid 000101110011111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}% ... |
Summary
In the above tree structure of statistics for segments, CubeStatsReader will try to print out the statistics of cuboids in the segments. And If some segments were not built in a cube but had the already used "HyperLogLog" algorithm to generate estimate statics, then segment statistics will use estimate statistics to show out.
...
Using estimate or precise statistics will with the related prefix "est" or "precise".
How to use
Check the cube whether contains precise statistics
check the cube
check the segments which contain "cuboid_statics_rows_bytes" and "cuboid_statics_size_bytes" not be null
Not only building job, but also merging and optimizing cube will update the "cuboid_statics_rows_bytes" and "cuboid_statics_size_bytes" in segments.
Show statistics of segments in a cube
Code Block |
---|
bin/kylin.sh org.apache.kylin.engine.mr.common.CubeStatsReader {cube name} |
- If Kyllin4 does not has a base cuboid for a cube, then the base cuboid "1111...111" row will be 0 and the size is 0.0 MB.
- If a cuboid does not exist, then its children will show the shrink percentage to be "-0.0 %".
- the command can only check a cube every time.
...