Setup Development Env
By this tutorial, you will be able to build griffin dev environment to go through all griffin data quality process as below
- explore data assets,
- create measures,
- schedule measures,
- execute measures in compute clusters and emit metrics
- navigate metrics in dashboard.
Dev dependencies
Java :
we prefer java 8, but java 7 is fine for us.
Maven :
Prerequisities version is 3.2.5
Scala
Prerequisities version is 2.10
Angular
We are using 1.5.8
Env dependencies
Hadoop
Prerequisities version is 2.6.0
Hive
Prerequisities version is 1.2.1
Spark
Prerequisities version is 1.6.x
Mysql
Elastic search
Prerequisities version is 5.x.x
Make sure you can access your elastic search instance by http protocol.
Livy
Griffin submit jobs to spark by Livy( http://livy.io/quickstart.html )
#livy has one bug (https://issues.cloudera.org/browse/LIVY-94), so we need to make these three jars in spark classpath datanucleus-api-jdo-3.2.6.jar datanucleus-core-3.2.10.jar datanucleus-rdbms-3.2.9.jar
Setup Dev Env
Git clone
git clone https://github.com/apache/incubator-griffin.git
Project layout
There are three modules in griffin
measure : core algorithms for calculate metrics by different measure dimension.
#app org.apache.griffin.measure.batch.Application
service : web service for data assets, measure metadata, and job schedulers.
#spring boot app org.apache.griffin.core.GriffinWebApplication
ui : front end
Update several files to reflect your dev env
create a griffin working directory in hdfs
hdfs dfs -mkdir -p <griffin working dir>
init quartz tables by service/src/main/resources/Init_quartz.sql
mysql -u username -p quartz < service/src/main/resources/Init_quartz.sql
update service/src/main/resources/application.properties
spring.datasource.url = jdbc:mysql://<MYSQL-IP>:3306/quartz?autoReconnect=true&useSSL=false spring.datasource.username = <user name> spring.datasource.password = <password> hive.metastore.uris = thrift://<HIVE-IP>:9083 hive.metastore.dbname = <hive database name> # default is "default"
update measure/src/main/resources/env.json with your elastic search instance, and copy env.json to griffin working directory in hdfs.
/*Please update as your elastic search instance*/ "api": "http://<ES-IP>:9200/griffin/accuracy"
update service/src/main/resources/sparkJob.properties file
sparkJob.file = hdfs://<griffin working directory>/griffin-measure.jar sparkJob.args_1 = hdfs://<griffin working directory>/env.json sparkJob.jars_1 = hdfs://<pathTo>/datanucleus-api-jdo-3.2.6.jar sparkJob.jars_2 = hdfs://<pathTo>/datanucleus-core-3.2.10.jar sparkJob.jars_3 = hdfs://<pathTo>/datanucleus-rdbms-3.2.9.jar sparkJob.uri = http://<LIVY-IP>:8998/batches
update ui/js/services/service.js
#make sure you can access es by http ES_SERVER = "http://<ES-IP>:9200"
Build
cd incubator-griffin mvn clean install -DskipTests#cp jars to hdfd griffin working dircp /measure/target/measure-0.1.3-incubating-SNAPSHOT.jar /measure/target/griffin-measure.jarhdfs dfs -put griffin-measure.jar <griffin working dir>
Run
java -jar service/target/service.jar #open from your browser http://<YOUR-IP>://8080
License Header File
Each source file should include the following Apache License header
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.