THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Part-I Why we need a read-write separation deployment

...

Background

As we all know, You maybe know that Kylin 4.0 needs is using Spark to build cube data before querying, so and query cube, if the build (spark) jobs and query (spark) jobs are both running in one the same Hadoop cluster, service may be unstable the performance of build/query will be affected because of the resource preemption resource competition

Now, Kylin 4.0 supports to run  running build jobs and query jobs on the different two Hadoop clusters which we call build cluster and query cluster. The build (spark) jobs will be sent to build cluster to build cube data, and then the cube/cuboid data will be wrote to the HDFS  on HDFS of the query cluster directly, so that we can execute query to read cube data from the query related workload will be move to query cluster.

With a read-write separation deployment, we can completely isolate spearate build and query workloads.query computation workload(mostly the yarn resourece). In the current implementation, HDFS of query cluster is not exclusive to the read cluster only, because build cluster will read data from query cluster('s HDFS) when merging segments.

Architecture of Read-Write Separation

...

Notes:

  1. Kylin 4.0 uses the Yarn resources on build cluster to build cube data and then write the cube data back to the HDFS on query cluster directly.
  2. Build jobs read the Hive data sources which are on build cluster.
  3. When executing pushdown query, Kylin 4.0 reads the Hive data from build cluster.

...

How to step up

  1. Install Kylin 4.0 by with the following guide on Kylin serverinstallation guide in a node which contains Hadoop client configuration of query cluster.
  2. Create a directory called 'build_hadoop_conf' in  in the $KYLIN_HOME and then copy the hadoop configuration files of the build cluster into this directory ( Note: make sure copy the real configuration files, not the symbolic links ).
  3. Set  Set the value of the configuration 'kylin.engine.submit-hadoop-conf-dir' in the '$KYLIN_HOME/conf/kylin.properties' to the directory created in the step 2.
  4. Copy the hive-site.xml of  from the build cluster to query cluster into  with the right directory of (the hive configuration on query cluster , for example: /etc/hive/conf).
  5. The value of the configuration 'kylin.engine.spark-conf.spark.yarn.queue' in the '$KYLIN_HOME/conf/kylin.properties' should be configured as the queue of the Yarn on build cluster.



...

Part-III FAQ

...

  • $KYLIN_HOME/bin/sample.sh is not supported in this deployment mode.

...