Overview

Using this connector application developers caching flat Java domain objects can easily persist those objects to a relational database table.

Goals

Using only gfsh commands you can configure simple in-line caching between Apache Geode and a relational database (e.g. MySQL) communicating through the database’s JDBC driver.

  1. Users can easily map a region to a database table with minimal configuration.
  2. Users can persist entity objects to an external database table, synchronously or asynchronously.
  3. Users can read database table rows into a region.

Approach

  1. Expose new create data-source and create jdbc-mapping commands.

  2. Implement JdbcReader, JdbcWriter, and JdbcAsyncWriter and automatically add them to the region specified in create jdbc-mapping.

Out of Scope

This JDBC connector is not intended to solve complex use cases involving complex entity objects, nested structures nor every inline caching use case. It is designed to be a general fit and will not support special database features (such as auto-incremented id fields). Users should implement normalized flat tables and corresponding domain objects covering only one topic.  


 
 GFSH Commands


Use help <command> to show the help

create data-source

list data-source

describe data-source

destroy data-source

create jdbc-mapping

describe jdbc-mapping

destroy jdbc-mapping

list jdbc-mapping


Out-Of-Box implementation

 

JDBC Connector is designed to be usable out of the box.  It is designed to plug into a pattern where you have a table name that won’t change.

1) Create a region with the same name as the table. 

2) create data-source defines the URL for Geode to communicate with the external database through its JDBC driver.

3) create jdbc-mapping will map fields in your domain object to columns in the table of the external database. This command automatically configures the JdbcReader and JdbcAsyncWriter (default) or JdbcWriter (synchronous) that use the JDBC protocol to support inline caching between the Geode region and the external table specified.

4) Once the data source and region-table mapping (JDBC Mapping) are set up, the user can perform Region.put and Region.get operations to persist data or load data from the external data source. Region.put adds or updates an entry in the region as well as updating the table in the external data source. If Region.get results in a cache miss, the JDBC cache loader will automatically load the entry into the region from the database table, and return the entry to the caller of Region.get.

Getting Started

JDBC connector works best when using only one application--one region--one table tenancy.


  Setup
  1. You’ll need an existing JDBC data source with a table you want to inline (map) a region.

    ex: a MySQL RDS instance running in AWS

2. In gfsh you’ll need to add the jdbc jar to the server classpath when you start the server.

gfsh>start server --name=s1 --classpath="/path/to/mysql-connector-java-5.1.45-bin.jar"

  OR use deploy --jar gfsh command to deploy the JDBC driver and add it to the Geode classpath.

gfsh>deploy --jar="/path/to/mysql-connector-java-5.1.45-bin.jar"

3. Create a region with the same name as the table you want to persist to (optionally you can use a table/region name mismatch and configure the mapping with the jdbc-mapping command)

gfsh>create region --name="employee" --type=REPLICATE

4. Create a data source with JDBC connection URL and credentials. Please refer to your database documentation for details about JDBC connection URL. Here is an example:

gfsh>create data-source --name=mysql_data_source --url="jdbc:mysql://localhost/geode_db" --username=mysql_user

5. Map a table in external data source to a region using the create  jdbc-mapping command.

gfsh>create jdbc-mapping --data-source="mysql_data_source" --region="employee" --catalog=geode_db --table=employee --id=id --pdx-name=com.example.Employee

NOTE: For some database vendors, for example MySQL, you may need to specify --catalog for your database name. Other vendors, for example PostgreSQL, you may need to specify --schema for your database name. The --schema and --catalog options are used by the JDBC driver supplied with the vendor specific database. Please refer to your database documents for more details about catalog and schema.

 

Risks and Mitigations 

It is strongly recommended that you do your own use-case specific testing when using JDBC Connector.  Test the actual read/write/load requires the Geode region to be connected to the DB and make sure JDBC Connector performs according to your SLAs.  Make sure you understand the number of connections made to your DB and review against your licensing agreement where applicable. It is also recommended that you not read or write directly to a DB which is your system of record.  If possible, use a DB that is designed to be a backup of your cache and DB for ETL(Extract-Transform-Load) from there. 

Region to Database table mapping

By default the connector uses region name as table name and Java class field names as column names.

The create jdbc-mapping gfsh command requires that you specify --pdx-name. The Java class specified by --pdx-name should pre-exist in classpath or you may add the --pdx-class-file option to register a PDX type for your domain class.  For example, ---pdx-class-file=/path/to/com/example/Employee.class.

Default Mapping and Case Sensitivity

Domain object field names must match RDBMS table column names. The default mapping of fields to columns first looks for an exact, case-sensitive name match. If not found it then looks for a case insensitive match. If the jdbc-mapping command cannot find case insensitive matches for all domain object fields and DB columns, an exception is thrown.

Field name Mapping

Every Java class field name (The fields in the Java class specified by --pdx-name) must map to an existing column name. If a column does not exist, an exception will be thrown, when creating the jdbc-mapping.

Field type mapping

The connector tries to map the Java class field type to an appropriate SQL type. The user needs to make sure that the Java class field type is compatible with the database column type. The describe jdbc-mapping command will display the field and column type mapping.

The JDBC Connector relies on the JDBC driver’s ability to map a Java type to an SQL type. The JDBC driver will throw an exception if the types are not compatible. If the Java field is null, then the JDBC PreparedStatement.setNull method is called. For non-null Java field, the JDBC PreparedStatement.setObject method is called with the Java field data.

Writing a PDX field that is a Character or char

For characters whose value is zero, JDBC PreparedStatement.setNull will be called. All non-zero characters are changed to a String.  We advise against using Character, as when it uses PreparedStatement.setObject, JDBC will convert it for the table column on insert, and on a read will use ResultSet.getObject and won’t convert back to Character. Note that if an object field originally contained an instance of java.lang.Character, then it is written to database as an instance of String using PreparedStatement.setObject.  So when it is read using ResultSet.getObject, it will not be an instance of java.lang.Character.

Writing a PDX field that is a java.util.Date

java.util.Date instances are changed to a sql date type (java.sql.Date, java.sql.Time, or java.sql.Timestamp) based on the column type.

Type mapping while reading

When data is read from the database, the data is converted to the Java field type by calling getXXX on the ResultSet. For example, if the Java field type is short, then ResultSet.getShort is called.

The JDBC getXXX method will throw an exception if it is unable to convert the column data to a Java type.

The following special cases exist:

Reading a char field

ResultSet.getString is called and the first char of the String is put in the PDX field. If the String is null or empty then zero is put in the PDX field.

Reading a date field

The column type is used to call one of the following: ResultSet.getDate, ResultSet.getTime, or ResultSet.getTimestamp. The resulting object is put in the Java field as a java.util.Date.

Reading a byte array field

If the column type is BLOB, then ResultSet.getBlob is called and the entire blob is read into a byte array. Otherwise ResultSet.getBytes is called.

Reading an array field other than byte array

ResultSet.getObject() is called and the output is cast to the field’s array type.

Changes to the Java domain class

If the developer changes the Java domain class (adding or removing fields or changing field data type) after the JDBC mapping has been created, the database table should be updated and the JDBC mapping should be destroyed and re-created. Otherwise data from Geode will not be persisted to the database table and exceptions may be thrown from the JDBC driver.

Entity Object Structure

Nested object structures in the Geode domain class are not supported.  Flat table structures with unique key columns must be used. This JDBC connector is designed for persisting flat domain objects to a table and/or reading data from external RDBMS into a Geode region for further processing.

The JDBC connector supports both single and composite keys (when more than one table column is needed to create a unique key).

If you don’t want to use reflection-based PDX serialization on your client application, your domain class has to implement one of the following interfaces for serialization: java.io.Serializable, DataSerializable or PdxSerializable.




 

  • No labels