THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

1. Background

When a large data is written to hbase cluster at the same time, the cluster load will become very high, which will affect the query performance.

To avoid the query performance being affected, KYLIN-4833 adds an optional step called "HFile Distcp To HBase" between “Convert Cuboid Data to HFile” and "Load HFile to HBase Table" in build job.

"HFile Distcp To HBase" will write the data to hadoop hdfs before step “Convert Cuboid Data to HFile”,and then hfile will be transferred to the hbase cluster by DistCp。DistCp can controls the speed of write data so as to reduce the pressure of cluster.

By default, the build job does not include step "HFile Distcp To HBase". You can enable it by setting kylin.storage.hfile-distcp-enable=true.

2. Configuration

There are three related configuration items in this step:

ConfigurationDefault valueDescription
kylin.storage.hfile-distcp-enablefalseWhether to enable this step in build job
kylin.storage.distcp-map-bandwidth20Specifies the bandwidth of each map in MB
kylin.storage.distcp-max-map-num50Maximum number of map
  • No labels