THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Now, Kylin 4.0 supports to finish run build jobs and query jobs on the different Hadoop clusters which we call build cluster and query cluster. The build jobs will be sent to build cluster to build cube data, when the build jobs finished, and then the cube data will be sent wrote to the HDFS  on the query cluster directly, so that we can execute query to read cube data from the query cluster.

 With With a read-write separation deployment, we can completely isolate build and query workloads.

Read-Write Separation Architecture

Image Modified

Prepare

  1. Make sure the hadoop version(HDP or CDH) is supported by Kylin.
  2. Check commands like hdfs and hive are all working properly and can access cluster resources.
  3. If the two clusters have enabled the HDFS NameNode HA, please check and make sure their HDFS nameservice names are different. If they are the same, please change one of them to avoid conflict.
  4. Please make sure the network latency between the two clusters is low enough, as there will be a large number of data moved back and forth during model build process.


Configuration

  1. Install Kylin 4.0 by the following guide on Kylin server.
  2. Prepare the hadoop configuration files of the two cluster and put them into Kylin server.
    1. Open $KYLIN_HOME/conf/kylin.properties
    2. Set  kylin.env.hadoop-conf-dir with the path of the directories of query cluster hadoop configuration files.
    3. Set kylin.engine.submit-hadoop-conf-dir with the path of the directories of build cluster hadoop configuration files.
       
  3. Put the hive-site.xml of the build cluster into the directory of query cluster hadoop configuration files.

Now read/write separation deployment is configured.

...

Note

  • $KYLIN_HOME/bin/check-env.sh and $KYLIN_HOME/bin/sample.sh are not available in this deployment mode.

  • In this mode, kylin.engine.spark-conf.spark.yarn.queue in kylin.properties should be configured as the queue of the build cluster.

...