Development

Setup Development Env

By this tutorial, you will be able to build griffin dev environment to go through all griffin data quality process as below

explore data assets,
create measures,
schedule measures,
execute measures in compute clusters and emit metrics
navigate metrics in dashboard.

Dev dependencies

Java :

we prefer java 8, but java 7 is fine for us.

Maven :

Prerequisities version is 3.2.5

Scala

Prerequisities version is 2.10

Angular

We are using 1.5.8

Env dependencies

Hadoop

Prerequisities version is 2.6.0

Hive

Prerequisities version is 1.2.1

Spark

Prerequisities version is 1.6.x

Mysql

Elastic search

Prerequisities version is 5.x.x

Make sure you can access your elastic search instance by http protocol.

Livy

Griffin submit jobs to spark by Livy( http://livy.io/quickstart.html )

#livy has one bug (https://issues.cloudera.org/browse/LIVY-94), so we need to make these three jars in spark classpath
datanucleus-api-jdo-3.2.6.jar
datanucleus-core-3.2.10.jar
datanucleus-rdbms-3.2.9.jar

Setup Dev Env

Git clone

git clone https://github.com/apache/incubator-griffin.git

Project layout

There are three modules in griffin

measure : core algorithms for calculate metrics by different measure dimension.

#app
org.apache.griffin.measure.batch.Application

service : web service for data assets, measure metadata, and job schedulers.

#spring boot app
org.apache.griffin.core.GriffinWebApplication

ui : front end

Update several files to reflect your dev env

create a griffin working directory in hdfs

hdfs dfs -mkdir -p <griffin working dir>

init quartz tables by service/src/main/resources/Init_quartz.sql

mysql -u username -p quartz < service/src/main/resources/Init_quartz.sql

update service/src/main/resources/application.properties

spring.datasource.url = jdbc:mysql://<MYSQL-IP>:3306/quartz?autoReconnect=true&useSSL=false
spring.datasource.username = <user name>
spring.datasource.password = <password>

hive.metastore.uris = thrift://<HIVE-IP>:9083
hive.metastore.dbname = <hive database name>    # default is "default"

update measure/src/main/resources/env.json with your elastic search instance, and copy env.json to griffin working directory in hdfs.

/*Please update as your elastic search instance*/
"api": "http://<ES-IP>:9200/griffin/accuracy"

update service/src/main/resources/sparkJob.properties file

sparkJob.file = hdfs://<griffin working directory>/griffin-measure.jar
sparkJob.args_1 = hdfs://<griffin working directory>/env.json
sparkJob.jars_1 = hdfs://<pathTo>/datanucleus-api-jdo-3.2.6.jar
sparkJob.jars_2 = hdfs://<pathTo>/datanucleus-core-3.2.10.jar
sparkJob.jars_3 = hdfs://<pathTo>/datanucleus-rdbms-3.2.9.jar
sparkJob.uri = http://<LIVY-IP>:8998/batches

update ui/js/services/service.js

#make sure you can access es by http
ES_SERVER = "http://<ES-IP>:9200"

Build

cd incubator-griffin
mvn clean install -DskipTests#cp jars to hdfd griffin working dircp /measure/target/measure-0.1.3-incubating-SNAPSHOT.jar /measure/target/griffin-measure.jarhdfs dfs -put griffin-measure.jar <griffin working dir>

Run

java -jar service/target/service.jar
#open from your browser
http://<YOUR-IP>://8080

License Header File

Each source file should include the following Apache License header

Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Page tree