Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
Source code
git clone https://github.com/apache/kylin.git -b kylin-on-parquet-v2 # Compile mvn clean install -DskipTests
Environment on the dev machine
Install Maven
The latest maven can be found at http://maven.apache.org/download.cgi, we create a symbolic so that mvn
can be run anywhere.
cd ~ wget http://xenia.sote.hu/ftp/mirrors/www.apache.org/maven/maven-3/3.2.5/binaries/apache-maven-3.2.5-bin.tar.gz tar -xzvf apache-maven-3.2.5-bin.tar.gz ln -s /root/apache-maven-3.2.5/bin/mvn /usr/bin/mvn
Install Spark
Manually install the Spark binary in in a local folder like /usr/local/spark. Kylin is not support community version currently. Download spark with the following:
# spark version is spark-2.4.1-os-kylin-r3 wget https://download-resource.s3.cn-north-1.amazonaws.com.cn/osspark/spark-2.4.1-os-kylin-r3.tgz
How to Debug
There are two modes to debug source code : Debug with local metadata(recommend) and debug with hadoop sandbox.
Configuration
Debug with local
Edit the properties of $KYLIN_SOURCE_DIR/examples/test_case_data/sandbox/kylin.properties
# Need to use absolute path kylin.metadata.url=${KYLIN_SOURCE_DIR}/examples/test_case_data/sample_local kylin.storage.url=${KYLIN_SOURCE_DIR}/examples/test_case_data/sample_local kylin.env.zookeeper-is-local=true kylin.env.hdfs-working-dir=file://$KYLIN_SOURCE_DIR/examples/test_case_data/sample_local kylin.engine.spark-conf.spark.master=local kylin.engine.spark-conf.spark.eventLog.dir=/path/to/local/dir kylin.engine.spark-conf.spark.sql.shuffle.partitions=1 kylin.env=LOCAL
Debug with Hadoop sandbox
Local configuration must be modified to point to your hadoop sandbox (or CLI) machine.
- In examples/test_case_data/sandbox/kylin.properties
- Find
sandbox
and replace with your hadoop hosts (if you’re using HDP sandbox, this can be skipped) - Find
kylin.job.use-remote-cli
and change it to “true” (in code repository the default is false, which assume running it on hadoop CLI) - Find
kylin.job.remote.cli.username
andkylin.job.remote.cli.password
, fill in the user name and password used to login hadoop cluster for hadoop command execution; If you’re using HDP sandbox, the default username isroot
and password ishadoop
.
- Find
- In examples/test_case_data/sandbox
- For each configuration xml file, find all occurrences of
sandbox
andsandbox.hortonworks.com
, replace with your hadoop hosts; (if you’re using HDP sandbox, this can be skipped)
- For each configuration xml file, find all occurrences of
An alternative to the host replacement is updating your hosts
file to resolve sandbox
and sandbox.hortonworks.com
to the IP of your sandbox machine.
Launch Kylin Web Server
Copy server/src/main/webapp/WEB-INF to webapp/app/WEB-INF
cp -r server/src/main/webapp/WEB-INF webapp/app/WEB-INF
Download JS for Kylin web GUI. npm
is part of Node.js
, please search about how to install it on your OS.
cd webapp npm install -g bower bower --allow-root install
If you encounter network problem when run “bower install”, you may try:
git config --global url."git://".insteadOf https://
Note, if on Windows, after install bower, need to add the path of “bower.cmd” to system environment variable ‘PATH’, and then run:
bower.cmd --allow-root install
In IDE, launch org.apache.kylin.rest.DebugTomcat
. Please set the path of “server” module as the “Working directory”, set “kylin-server” for “Use classpath of module”, and check “Include dependencies with ‘Provided’ scope” option in IntelliJ IDEA 2018. If you’re using IntelliJ IDEA 2017 and older, you need modify “server/kylin-server.iml” file, replace all “PROVIDED” to “COMPILE”, otherwise an “java.lang.NoClassDefFoundError: org/apache/catalina/LifecycleListener” error may be thrown..
You may also need to tune the VM options:
-Dhdp.version=2.4.0.0-169 -DSPARK_HOME=/usr/local/spark -Dkylin.hadoop.conf.dir=/workspace/kylin/examples/test_case_data/sandbox -Xms800m -Xmx800m -XX:PermSize=64M -XX:MaxNewSize=256m -XX:MaxPermSize=128m
Also remeber that if you debug with local mode, you should add VM option for query engine:
-Dspark.local=true
If you worked with Kerberized Hadoop Cluster, the additional VM options should be set:
-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.krb5.principal=kylin -Djava.security.krb5.keytab=/path/to/kylin/keytab
And Hadoop environment variable:
HADOOP_USER_NAME=root
By default Kylin server will listen on 7070 port; If you want to use another port, please specify it as a parameter when run DebugTomcat
.
Check Kylin Web at http://localhost:7070/kylin
(user:ADMIN, password:KYLIN)
Setup IDE code formatter
In case you’re writting code for Kylin, you should make sure that your code in expected formats.
For Eclipse users, just format the code before committing the code.
For intellij IDEA users, you have to do a few more steps:
Install “Eclipse Code Formatter” and use “org.eclipse.jdt.core.prefs” and “org.eclipse.jdt.ui.prefs” in core-common/.settings to configure “Eclipse Java Formatter config file” and “Import order”
- Go to Preference => Code Style => Java, set “Scheme” to Default, and set both “Class count to use import with ‘*’” and “Names count to use static import with ‘*’” to 99.
Disable intellij IDEA’s “Optimize imports on the fly”
Format the code before committing the code.
Setup IDE license header template
Each source file should include the following Apache License header
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
The checkstyle plugin will check the header rule when packaging also. The license file locates under dev-support/checkstyle-apache-header.txt
. To make it easy for developers, please add the header as Copyright Profile
and set it as default for Kylin project.
How to Package and Deploy
cd ${KYLIN_SOURCE_CODE} # For HDP2.x ./build/script/package.sh # For CDH5.7 ./build/script/package.sh -P cdh5.7 # After finished, the package will be avaliable in the directory ${KYLIN_SOURCE_CODE}/dist/ # If running on HDP, you need to uncomment the following properties in kylin.properties kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current