THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

As we all know, Kylin needs to build cube before query. So data before querying, so if the build job jobs and query job jobs are both running in one cluster, service may be unstable because of the resource preemption. 

Now, Kylin 4.0 supports to finish building build and query tasks jobs on the different Hadoop clusters which we call build cluster and query cluster . There will be many write operations in the build cluster  and read-only operation in query cluster. The build task jobs will be sent to build cluster . When to build cube data, when the build task jobs finished, the cube data will be sent to the HDFS  on the query cluster so that we can execute query to read cube data from the query taskscluster.

 With a read/-write separation deployment, we can completely isolate both build and query workloads.

Architecture

Image RemovedImage Added



Prepare

  1. Make sure the hadoop version(HDP or CDH) is supported by Kylin.
  2. Check commands like hdfs and hive are all working properly and can access cluster resources.
  3. If the two clusters have enabled the HDFS NameNode HA, please check and make sure their HDFS nameservice names are different. If they are the same, please change one of them to avoid conflict.
  4. Please make sure the network latency between the two clusters is low enough, as there will be a large number of data moved back and forth during model build process.

...