Objective
Demonstrate how to configure gaianDB to connect to a relational database source
Provide an example of using a reasonable size data set
Act as a baseline for experimenting with VDC, Atlas & ranger plugins
Excluded
This page doesn't cover setup/install of the ranger plugin, install of Mariadb/linux or setup of reporting tools or other data oriented UIs that use the data source
Topology
MacOS client running dbvisualizer
Ubuntu 17.10 server running MariaDB
- An Azure B series 4GB ram/2 cpu environment is sufficient
Data Source
Install mariadb database - a more open fork of MySQL that is compatible. Follow the instructions at get-started to get mariadb installed
My versions for this test are:
jonesn@worklinux:~$ dpkg -l | grep mariadb
ii mariadb-client 10.1.25-1 all MariaDB database client (metapackage depending on the latest version)
ii mariadb-client-10.1 10.1.25-1 amd64 MariaDB database client binaries
ii mariadb-client-core-10.1 10.1.25-1 amd64 MariaDB database core client binaries
ii mariadb-common 10.1.25-1 all MariaDB common metapackage
ii mariadb-server 10.1.25-1 all MariaDB database server (metapackage depending on the latest version)
ii mariadb-server-10.1 10.1.25-1 amd64 MariaDB database server binaries
ii mariadb-server-core-10.1 10.1.25-1 amd64 MariaDB database core server files
Download MySQL employee sample database by cloning the repo at https://github.com/datacharmer/test_db
- 160MB
- 4 million records (300k employees, 2.8 million salary entries)
- 6 tables
- open license (please read the details)
Follow the instructions to load this data into mysql
Client
One good tool for checking the database is dbvisualizer but any JDBC client will do
Check one can connect to the above database - using something like jdbc:mysql://12.34.56.78:3306/employees
GaianDB
Download gaiandb from github
Unpack the prebuilt binary from the build subdirectory (gaiandb is tricky to build currently)
- I am using GAIANDB_V2.1.8_20160523.zip
Download the current mariadb jdbc driver (or copy/link from local system)
- I am using MariaDB Connector/J 2.1.2
Copy the .jar into a convenient location, such as /home/jonesn/mysql
In theory gaianDB should pick up jdbc drivers from the 'ext' subdirectory in the unpacked directory - however I couldn't get this to work, so update launchGaianServer.sh with the following line under 'static jars setting'
export CLASSPATH="$CLASSPATH:/home/jonesn/mysql/mariadb-java-client-2.1.2.jar"
Also modify the java heap size in this script by searching for JAVA_OPTS and using something like (the default is 256 MB)
[[ -z "$JAVA_OPTS" ]] && JAVA_OPTS=-Xmx1024m
Finally configure the mysql data source by modifying gaiandb_config.properties
VEMPLOYEE_DEF=employee_number integer, birth_date date, firstname varchar(14), lastname varchar(14), gender char, hiring_date date
# Example of an RDB data source, this one points to the 'DB2' connection property below (specifying DRIVER, URL, USR, PWD).
MYSQL_DB_DRIVER=com.mysql.jdbc.Driver
MYSQL_DB_URL=jdbc:mysql://localhost:3306/employees
MYSQL_USR=gaiandb
MYSQL_PWD=PUT YOUR PASSWORD HERE
VEMPLOYEE_DS0_CONNECTION=MYSQL employees
VEMPLOYEE_DS0_URL=jdbc:mysql://localhost:3306/employees
VEMPLOYEE_DS0_C1=emp_no
VEMPLOYEE_DS0_C2=birth_date
VEMPLOYEE_DS0_C3=first_name
VEMPLOYEE_DS0_C4=last_name
VEMPLOYEE_DS0_C5=gender
VEMPLOYEE_DS0_C6=hire_date
Note that the columns are named slightly differently to the source. For our VDC project this is done automatically by the virtualizer component using metadata from Atlas
Now you can launch gaiandb with
./launchGaianServer.sh
There's also documentation on a larger data set backed by postgresSQL - UK Land Registry Data