Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Overview

Engineers with systems operating on and Using this connector application developers caching flat Java domain objects want to can easily persist those objects to a relational database table.  Ex: put Employee object into a Region, read/write Employee object to/from Employee table

Goals

Goals

Using only gfsh commands you can configure simple in-line caching between Apache Geode and a relational database (e.g. MySQL) communicating through the database’s JDBC driver.

  1. Users User can easily map a region to a JDBC database table with minimal configuration.
  2. User Users can persist entity objects put into the region for reading and writingto an external database table, synchronously or asynchronously.
  3. Users can read database table rows into a region.

Approach

  1. Expose new jdbccreate data-mapping source and create jdbc-connection mapping commands.

  2. Implement JdbcReader, JdbcWriter, and JdbcAsyncWriter to be added to a region on region createand automatically add them to the region specified in create jdbc-mapping.

Out of Scope

This

won’t

JDBC connector is not intended to solve complex use cases involving complex entity objects, nested structures

, etc.  This won’t solve

nor every inline caching use case.

 It

It is designed to be a general fit and

so won’t solve unique use cases.  Users are still free to code their own   Normalized

will not support special database features (such as auto-incremented id fields). Users should implement normalized flat tables and corresponding domain objects covering only one topic

are encouraged

.  


 
 GFSH ChangesCommands


Use help <command> to show the help 

create

jdbc

data-

mapping

source

alter jdbc

list data-

mapping

source

describe

jdbc

data-

mapping

source

destroy

jdbc

data-

mapping

source

list jdbc-mappings

create jdbc-

connection

mapping

alter jdbc-connection

describe jdbc-

connection

mapping

destroy jdbc-

connection

mapping

list jdbc-

connections

Notable New Classes

JdbcReader

JdbcWriter

JdbcAsyncWriter

 

mapping


Out-Of-Box implementation

 

JDBC Connector is designed to be usable out of the box.   It  It is designed to plug into a pattern where you have a specific table name that (probably) wont won’t change.  

1) Just create Create a region with the same name , add on a JdbcWriter (for write through) or JdbcAsyncWriter (for write behind) or JdbcReader (for read through)as the table

2) Create a jdbc-connection to your backend DB, create a jdbc-mapping to match the region to the connection.

3) Begin putting entity objects into the region the default behavior will attempt to write to a table matching the name of the region, into columns matching the field names on your object.

If your names don't match you can also configure specific region name to table names and field names to columns names in the jdbc-mapping command.

 

create data-source defines the URL for Geode to communicate with the external database through its JDBC driver.

3) create jdbc-mapping will map fields in your domain object to columns in the table of the external database. This command automatically configures the JdbcReader and JdbcAsyncWriter (default) or JdbcWriter (synchronous) that use the JDBC protocol to support inline caching between the Geode region and the external table specified.

4) Once the data source and region-table mapping (JDBC Mapping) are set up, the user can perform Region.put and Region.get operations to persist data or load data from the external data source. Region.put adds or updates an entry in the region as well as updating the table in the external data source. If Region.get results in a cache miss, the JDBC cache loader will automatically load the entry into the region from the database table, and return the entry to the caller of Region.get.

Getting Started

JDBC connector works best when starting from an empty region and empty table (thus synced).  And using only one application--one region--one table tenancy.


  Setup
  1. You’ll need an existing jdbc datasource JDBC data source with a table you want to inline (map) a region with and an entity object that implements PdxSerializable.

    ex: a MySQL RDS instance running in AWS

2. In gfsh you’ll need to add the jdbc jar to the server classpath when you start the server.

gfsh>start server --name=s1 --classpath="

/Users/fkrone/workspace

/path/to/mysql-connector-java-5.1.45-bin.jar"

  OR use deploy --jar gfsh command to deploy the JDBC driver and add it to the Geode classpath.

gfsh>deploy --jar="/path/to/mysql-connector-java-5.1.45-bin.jar"

3. Create a region with the same name as the table you want to persist to (optionally you can use a table/region name mismatch and configure the mapping with the jdbc-mapping command)

 

gfsh>create region --name="employee" --type=REPLICATE

--cache-loader="org.apache.geode.connectors.jdbc.JdbcLoader"

4. Create a

jdbc-connection with DB url and credentialsgfsh>create jdbc-connection

data source with JDBC connection URL and credentials. Please refer to your database documentation for details about JDBC connection URL. Here is an example:

gfsh>create data-source --name=

aws

mysql_data_

mysql

source --url="jdbc:mysql://localhost/

rds-mysql-test.123.us-west-2.rds.amazonaws.com:3306/test?user=emma&password=iamthegreatest"

geode_db" --username=mysql_user

5. Map a table in external data source

5. map the new connection

to a region using the

jdbc

create  jdbc-mapping command.

 

gfsh>create jdbc-mapping --data-

connection

source="

aws

mysql_data_

mysql

source" --region="employee" --catalog=geode_db --table=employee --id=id --pdx-name=com.example.Employee

NOTE: For some database vendors, for example MySQL, you may need to specify --catalog for your database name. Other vendors, for example PostgreSQL, you may need to specify --schema for your database name. The --schema and --catalog options are used by the JDBC driver supplied with the vendor specific database. Please refer to your database documents for more details about catalog and schema.

 

Risks and Mitigations

 

Mitigations 

It is strongly recommended that you do your own use-case specific testing when using JDBC Connector.   Test an accurate  Test the actual read/write/load you would expect to requires the Geode region to be connected to the DB and make sure JDBC Connector performs according to your SLAs.   Make  Make sure you understand the number of connections made to your DB and review against your licensing agreement where applicable. It is also recommended that you not read or write directly to a DB which is your system of record.   If  If possible, use a DB that is designed to be a backup of your cache and DB for ETL(Extract-Transform-Load) from there. 

Region to Database table mapping

By default the connector uses region name as table name and PDX Java class field name names as column name. If the region or field name differs from the database, then a jdbc-mapping can be used to specify the mappingnames.

The create jdbc-mapping gfsh command requires that you specify --pdx-name. The Java class specified by --pdx-name should pre-exist in classpath or you may add the --pdx-class-file option to register a PDX type for your domain class.  For example, ---pdx-class-file=/path/to/com/example/Employee.class.

Default Mapping and Case Sensitivity

Domain object field names must match RDBMS table column names. The default mapping of fields to columns first looks for an exact, case-sensitive name match. If not found it then looks for a case insensitive match. If multiple the jdbc-mapping command cannot find case insensitive matches are found for all domain object fields and DB columns, an exception is thrown.

Write

Field name Mapping

Every PDX field Java class field name (The fields in the Java class specified by --pdx-name) must map to an existing column name. If a column does not exist then , an exception will be thrown, when a create or update is done.

Read Mapping

If creating the jdbc-mapping has pdx-class-name then every table column requires an existing PDX field name. If not the read throws an exception.  If the jdbc-mapping does not have a pdx-class-name then a PDX field name is created for every table column.

Type

Field type mapping

The connector tries to map the java PDX Java class field type to an appropriate sql SQL type. The user needs to make sure that the java PDX Java class field type is compatible with the database column type.

Type mapping while writing

The describe jdbc-mapping command will display the field and column type mapping.

The JDBC Connector relies on the JDBC driver’s ability to map a PDX Java type to the column an SQL type. The JDBC driver will throw an exception if the types are not compatible. If the PDX data Java field is null, then the JDBC PareparedStatementPreparedStatement.setNull method is called. For non-null PDX data Java field, the JDBC PreparedStatement.setObject method is called with the PDX data. The PDX data is passed to PreparedStatement.setObject unchanged with the following two exceptions:Java field data.

Writing a PDX field that is a Character or char

For characters whose value is zero, JDBC PreparedStatement.setNull will be called. All non-zero characters are changed to a String.  We advise against using Character, as when it will use PareparedStatementuses PreparedStatement.setObject, JDBC will convert it for the DB table column on insert, and on a read will use ResultSet.getObject and won’t convert back to Character (more under type mapping below). Note that if an object field originally contained an instance of java.lang.Character, then it is written to database as an instance of String using PreparedStatement.setObject.  So when it is read using ResultSet.getObject, it will not be an instance of java.lang.Character.

Writing a PDX field that is a java.util.Date

java.util.Date instances are changed to a sql date type (java.sql.Date, java.sql.Time, or java.sql.Timestamp) based on the column type.

Type mapping while reading

with a pdx-class-name

When data is read from JDBC, and the region mapping has a pdx-class-namethe database, the data is converted to the PDX Java field type by calling getXXX on the resultsetResultSet. For example, if the PDX Java field type is short, then ResultSet.getShort is called.

The JDBC getXXX method will throw an exception if its it is unable to convert the column data to a Java type.

The following special cases exist when reading with a pdx-class-name:

Reading a char field

ResultSet.getString is called and the first char of the String is put in the PDX field. If the String is null or empty then zero is put in the PDX field.

Reading a date field

The column type is used to call one of the following: ResultSet.getDate, ResultSet.getTime, or ResultSet.getTimestamp. The resulting object is put in the

PDX

Java field as a java.util.Date.

Reading a byte array field

If the column type is BLOB, then ResultSet.getBlob is called and the entire blob is read into a byte array. Otherwise ResultSet.getBytes is called.

Reading an array field other than byte array

ResultSet.getObject() is called and the output is cast to the field’s array type.

In most, if not all, cases this will result in a class cast exception.

Changes to the Java domain class

If the developer changes the Java domain class (adding or removing fields or changing field data type) after the JDBC mapping has been created, the database table should be updated and the JDBC mapping should be destroyed and re-created. Otherwise data from Geode will not be persisted to the database table and exceptions may be thrown from the JDBC driver

Reading an object field

If the column type is BLOB then ResultSet.getBlob is called and the entire BLOB is read into a byte array, otherwise ResultSet.getObject is called. If it returns a java.sql.Date, java.sql.Time, or java.sql.Timestamp then its converted to a java.util.Date.

Note that if an object field originally contained an instance of java.lang.Character, then it is written to JDBC as an instance of String using PreparedStatement.setObject.  So when it is read using ResultSet.getObject, it will not be an instance of java.lang.Character.

When data is read from JDBC and the region mapping has no pdx-class-name, then the type of every field is object.

Registering PDX Metadata

When a write is done on a region, PDX metadata is registered that describes the class. If an application reads before doing any writes, it can register the PDX metadata using the cache api Cache.registerPdxMetaData(). This can also happen if the PDX registry is not persisted and the cache server is restarted.

Entity Object Structure

Nested object structures in the Geode domain class are not supported.   Flat  Flat table structures with unique key columns must be used. This JDBC connector is designed for persisting flat domain objects to a table .  Pull from those flat tables for more complex transformations, etc.

Your entity object needs to implement PdxSerializable.

public class Employee implements PdxSerializable {

Implement the toData, fromData methods from the interface.  Jdbc returns objects in results which the JdbcLoader will try to convert.  Implement toData and fromData to cast object returns to specific fields.

and/or reading data from external RDBMS into a Geode region for further processing.

The JDBC connector supports both single and composite keys (when more than one table column is needed to create a unique key).

If you don’t want to use reflection-based PDX serialization on your client application, your domain class has to implement one of the following interfaces for serialization: java.io.Serializable, DataSerializable or PdxSerializable.