You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Problem

Currently, Geode supports the inserting of JSON documents using the JSONFormatter util. Even though this is util is designed for performance it lacks the capability and finesse to determine whether a JSON document partially matches an existing PDXType definition. This leads to the problem that the following two document logically represent the same object but would generate two different PDXType definitions.

JSON example 1
{
  "name": {
    "firstName": "John",
    "surname": "Doe"
  }
}
JSON example 2
{
  "name": {
    "firstName": "John",
    "surname": "Doe"
  },
  "address": {
    "addressLine1":"Suite 200",
    "addressLine2":"1235 South Rd",
    "addressLine3":null,
    "state":"OH",
    "zipCode":"287233",
    "country":"USA"
  }
}

 

In addition to this, the ordering of the fields is also a factor when it comes to determining the PdxType definition. Changing the order of and field within a JSON document will cause the potential generation of a new PdxType definition.

The reason for this behavior is that JSON is not a strongly structured format, whereas a Java Object and PdxType have strongly defined structures.

Proposed Solution

In order to avoid the possible side effect that each JSON document has the potential to generate a new PdxType definition, I suggest the following:

  1. Provide a format generic document format which can be used to describe the definition of a data type
  2. Register a type definition for a JSON document, which contains the superset of fields for a given JSON document
  3. Provide the type definition id within the JSON document, which would provide a hint to the PdxType registry which PdxType definition to use for the incoming JSON document.
  4. Provide a service to add new type definitions to a running cluster
  5. Export the defined type definitions into the generic type definition format

Advantages

The benefits of this solution are:

  • Reduced PdxType definitions for JSON documents representing the same logical domain object
  • Improved performance in processing of JSON documents, as the PdxType is known and does not have to be determined for each document
  • Ease of type definition management
  • Ability to ingest JSON documents representing the same type with a variance of populated fields
  • Ability to export existing type definitions into a human readable format
  • Ability to externally register data types

Disadvantages

The drawbacks to this solution:

  • The superset of all the fields for a JSON structure needs to be known
  • The type definition id needs to be provided for each sub-element within the JSON structure
  • If the type definition id is not provided the type definition generator will behave the same way it currently does. For every provided JSON document the system might potentially generate a new type definition.
  • All clients need to know the registered definitions and id's in order to provide the correct id for any given structure within the JSON document

Proposed Type Definition Format

 

Example Type Definition
[
  {
    "@type": "JSON_Document",
    "@typeId": 1,
    "fields": [
      {
        "fieldName": "firstName",
        "dataType": "String"
      },
      {
        "fieldName": "lastName",
        "dataType": "String"
      },
      {
        "fieldName": "age",
        "dataType": "Integer"
      },
      {
        "fieldName": "currentAddress",
        "@refTypeId": 2
      },
      {
        "fieldName": "previousAddresses",
        "dataType": "List",
        "@refTypeId": 2
      },
      {
        "fieldName": "luckyNumber",
        "dataType": "List"
      },
      {
        "fieldName": "dateOfBirth",
        "dataType": "Date",
        "format": "MM/dd/yyyy"
      },
      {
        "fieldName": "lastUpdated",
        "dataType": "Date",
        "format": "MM/dd/yyyy hh:mm:ss:SSS"
      }
    ]
  },
  {
    "@type": "JSON_Document",
    "@typeId": 2,
    "fields": [
      {
        "fieldName": "addressLine1",
        "dataType": "String"
      },
      {
        "fieldName": "addressLine2",
        "dataType": "String"
      },
      {
        "fieldName": "addressLine3",
        "dataType": "String"
      },
      {
        "fieldName": "state",
        "dataType": "String"
      },
      {
        "fieldName": "zipCode",
        "dataType": "String"
      },
      {
        "fieldName": "country",
        "dataType": "String"
      }
    ]
  }
]

 

 

 

 

 

  • No labels