Setup Development Env
By this tutorial, you will be able to build griffin dev environment to go through all griffin data quality process as below
- explore data assets,
- create measures,
- schedule measures,
- execute measures in compute clusters and emit metrics
- navigate metrics in dashboard.
Dev dependencies
Java :
we prefer java 8, but java 7 is fine for us.
Maven :
Prerequisities version is 3.2.5
Scala
Prerequisities version is 2.10
Angular
We are using 1.5.8
Env dependencies
Hadoop
Prerequisities version is 2.6.0
Hive
Prerequisities version is 1.2.1
Spark
Prerequisities version is 1.6.x
Mysql
Elastic search
Prerequisities version is 5.x.x
Make sure you can access your elastic search instance by http protocol.
Livy
Griffin submit jobs to spark by Livy( http://livy.io/quickstart.html )
#livy has one bug (https://issues.cloudera.org/browse/LIVY-94), so we need to make these three jars in spark classpath datanucleus-api-jdo-3.2.6.jar datanucleus-core-3.2.10.jar datanucleus-rdbms-3.2.9.jar
Setup Dev Env
git clone
git clone https://github.com/apache/incubator-griffin.git
build
cd incubator-griffin mvn clean install -DskipTests
dev
There are three modules in griffin
measure : core algorithms for calculate metrics by different measure dimension.
#app org.apache.griffin.measure.batch.Application
service : web service for data assets, measure metadata, and job schedulers.
#spring boot app org.apache.griffin.core.GriffinWebApplication
ui : front end
configure to reflect your dev env in several files
update service/src/main/resources/application.properties
spring.datasource.url = jdbc:mysql://<your IP>:3306/quartz?autoReconnect=true&useSSL=false spring.datasource.username = <user name> spring.datasource.password = <password> hive.metastore.uris = thrift://<your IP>:9083 hive.metastore.dbname = <hive database name> # default is "default"
create a griffin working directory in hdfs
hdfs dfs -mkdir -p <griffin working dir>
update measure/src/main/resources/env.json with your elastic search instance, and copy env.json to griffin working directory in hdfs.
/*Please update as your elastic search instance*/ "api": "http://HOSTNAME:9200/griffin/accuracy"
update service/src/main/resources/sparkJob.properties file
sparkJob.file = hdfs://<griffin measure path>/griffin-measure.jar sparkJob.args_1 = hdfs://<griffin working directory>/env.json sparkJob.jars_1 = hdfs://<pathTo>/datanucleus-api-jdo-3.2.6.jar sparkJob.jars_2 = hdfs://<pathTo>/datanucleus-core-3.2.10.jar sparkJob.jars_3 = hdfs://<pathTo>/datanucleus-rdbms-3.2.9.jar sparkJob.uri = http://<your IP>:8998/batches
update ui/js/services/service.js
#make sure you can access es by http ES_SERVER = "http://<your IP>:9200"
License Header File
Each source file should include the following Apache License header
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.