Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Problem

Currently, Geode supports the inserting of JSON documents using the JSONFormatter util. Even though this is util is designed for performance it lacks the capability and finesse to determine whether a JSON document partially matches an existing PDXType definition. This leads to the problem that the following documents, logically represent the same object but would generat generate different PDXType definitions.

...

The reason for this behavior is that JSON is not a strongly structured format, whereas a Java Object and PdxType have strongly defined structures.

Proposed Solution

In order to avoid the possible side effect that each JSON document has the potential to generate a new PdxType definition, I suggest the following:

  1. Provide a format generic document format which can be used to describe the definition of a data type
  2. Register a type definition for a JSON document, which contains the superset of fields for a given JSON document
  3. Provide the type definition id within the JSON document, which would provide a hint to the PdxType registry which PdxType definition to use for the incoming JSON document.
  4. Provide a service to add new type definitions to a running cluster
  5. Export the defined type definitions into the generic type definition format

Advantages

The benefits of this solution are:

  • Reduced PdxType definitions for JSON documents representing the same logical domain object
  • Improved performance in processing of JSON documents, as the PdxType, is known and does not have to be defined and generated for each submitted document
  • Ease of type definition management
  • Ability to ingest JSON documents representing the same type with a variance of populated fields
  • Ability to export existing type definitions into a human readable format
  • Ability to externally register data types

Disadvantages

The drawbacks to this solution:

  • The superset of all the fields for a JSON structure needs to be known
  • The type definition id needs to be provided for each sub-element within the JSON structure
  • If the type definition id is not provided the type definition generator will behave the same way it currently does. For every provided JSON document the system might potentially generate a new type definition.
  • All clients need to know the registered definitions and id's in order to provide the correct id for any given structure within the JSON document

Limitations

 

Type Registry Service

Types can be defined in the following manner:

...

Note

The Type Registry is not bound to any one serialization mechanism and provides the means to define data types irrespective of serialization mechanism or format

 

...

Usage

The type registry service has to provide definition registry will behave in the following services:

  • addTypeDefinitions(List<TypeDefinition>)
  • exportTypeDefinitons():List<TypeDefinitions>
  • removeTypeDefinitions(List<TypeDefinition>)
  • lookupTypeDefinition(long typeDefId):TypeDefinitions
  • lookupTypeIdByTypeDefinition(TypeDefinition):long

 

Proposed Type Definition Format

In the below section the type definition format is described. 

manner:

  1. When inserting new a new definition the registry will check if a definition exists for the given typeId. If so, a "TypeDefinitionException" will be thrown. If more than one type definition is being registered the whole batch will fail until the offending type definition is fixed.
  2. If a referencedTypeId does not exist within the registry a "TypeDefinitionException" will be thrown. If more than one type definition is being registered the whole batch will fail until the offending type definition is fixed.
  3. An existing type definition cannot be updated. Any changes to an existing type definition will require a complete redefinition of the type including a new typeId.
  4. In the event where an incoming data entry references an existing type and defines extra fields in the document, that are not defined in any definition, a new type will automatically be created. This might cause a

 

Type Registry Service API

The type registry service has to provide the following services:

  • addTypeDefinitions(List<TypeDefinition>)
  • exportTypeDefinition(long typeDefId) : TypeDefinition
  • exportTypeDefinitons() : List<TypeDefinitions>
  • removeTypeDefinitions(List<TypeDefinition>)
  • lookupTypeDefinition(long typeDefId) : TypeDefinitions
  • lookupTypeIdByTypeDefinition(TypeDefinition) : long

 

Proposed Type Definition Format

In the below section the type definition format is described. 

Code Block
languagexml
titleType Definition Spec
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="typeDefinitons" type="typeDefinitonsType"/>
  <xs:complexType name="entryType">
    <xs:annotation>
      <xs:documentation>
        The type field represents the entry within a List | Set | Array. It can be either a primitive defined by the type attribute
        or referencing a defined type by using the refTypeId attribute.
      </xs:documentation>
    </xs:annotation>
    <xs:simpleContent>
      <xs:extension base="xs:string">
        <xs:attribute type="xs:long" name="refTypeId" use="optional"/>
        <xs:attribute type="xs:string" name="type" use="optional"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
  <xs:complexType name="valueType">
    <xs:annotation>
      <xs:documentation>
        The type field represents the value in the key:value for a map or dictionary. It can be either a primitive defined by the type attribute
        or referencing a defined type by using the refTypeId attribute. The name attribute allows for the specification of a named field in the incoming
        document to map to.
      </xs:documentation>
    </xs:annotation>
    <xs:simpleContent>
      <xs:extension base="xs:string">
        <xs:attribute type="xs:long" name="refTypeId" use="optional"/>
        <xs:attribute type="xs:string" name="name" use="optional"/>
        <xs:attribute type="xs:string" name="type" use="optional"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
  <xs:complexType name="typeDefinitonsType">
    <xs:sequence>
      <xs:element type="typeDefinitionType" name="typeDefinition" maxOccurs="unbounded" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="typeDefinitionType"><xs:annotation>
    <xs:documentation>
      This field represents the type that is to be defined. The type attribute represents either the class of the object. The typeId represents the externally defined typeId
      for this definition. This typeId will be used in other definitions referring this definition.
    </xs:documentation>
  </xs:annotation>
    <xs:sequence>
      <xs:element type="fieldsType" name="fields"/>
    </xs:sequence>
    <xs:attribute type="xs:string" name="type"/>
    <xs:attribute type="xs:long" name="typeId"/>
  </xs:complexType>
  <xs:complexType name="keyType">
    <xs:annotation>
      <xs:documentation>
        The type field represents the key in the key:value for a map or dictionary. It can be either a primitive defined by the type attribute
        or referencing a defined type by using the refTypeId attribute. The name attribute allows for the specification of a named field in the incoming
        document to map to.
      </xs:documentation>
    </xs:annotation>
    <xs:simpleContent>
      <xs:extension base="xs:string">
        <xs:attribute type="xs:long" name="refTypeId" use="optional"/>
        <xs:attribute type="xs:string" name="name" use="optional"/>
        <xs:attribute type="xs:string" name="type" use="optional"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
  <xs:complexType name="fieldType">
    <xs:sequence>
      <xs:element name="name">
        <xs:annotation>
          <xs:documentation>The name of field</xs:documentation>
        </xs:annotation>
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="primitiveTypeField"/>
            <xs:enumeration value="existingTypeField"/>
            <xs:enumeration value="mapTypeField"/>
            <xs:enumeration value="listTypeField"/>
            <xs:enumeration value="formattedDateTypeField"/>
            <xs:enumeration value="formattedDecimalTypeField"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
      <xs:element type="keyType" name="key" minOccurs="0">
        <xs:annotation>
          <xs:documentation>The type of the key within the map. Optional name attribute to define the corresponding field in the incoming document</xs:documentation>
        </xs:annotation>
      </xs:element>
      <xs:element type="valueType" name="value" minOccurs="0">
        <xs:annotation>
          <xs:documentation>The type of the value within the map.Optional name attribute to define the corresponding field in the incoming document</xs:documentation>
        </xs:annotation>
      </xs:element>
      <xs:element type="entryType" name="entry" minOccurs="0">
        <xs:annotation>
          <xs:documentation>The type of the entry this is stored within the List | Set | Array</xs:documentation>
        </xs:annotation>
      </xs:element>
      <xs:element name="formattingString" minOccurs="0">
        <xs:annotation>
          <xs:documentation>Optional formatting string for the data. This is used for importing/exporting of the field.</xs:documentation>
        </xs:annotation>
      </xs:element>
    </xs:sequence>
    <xs:attribute type="xs:string" name="type" use="optional"/>
    <xs:attribute type="xs:long" name="refTypeId" use="optional"/>
  </xs:complexType>
  <xs:complexType name="fieldsType">
    <xs:sequence>
      <xs:element type="fieldType" name="field" maxOccurs="unbounded" minOccurs="0">
        <xs:annotation>
          <xs:documentation>This is the field definition. It can be either a primitive type, an existing definition of a type, a mapType, a listType or a formatted Date/Decimal type</xs:documentation>
        </xs:annotation>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:schema>
Code Block
languagetext
titleType Definition Spec
{
  "@type" : "JSON_Document" | Fully qualified className,  <-- The name of the object type being defined
  "@typeId" : ,  <-- A numberic field which would uniquely identify the type
  "fields":[   <-- A collection of fields
    {
      "fieldName" :  ,  <-- The name of the field
      "dataType" : "String" | "Integer" | "Double" | "Date" | "Float" | "Boolean" | "Long" | Object (default),  <-- The data type of the field (Not required when using @refTypeId)
	  "format" : "MM/dd/yyyy hh:mm:ss:SSS " | "#0.00" <-- Optional field to provide a format when dealing with dates of doubles	
      "@refTypeId" : ,  <-- The reference to an already defined type
    },
    {
      "fieldName" :  ,  <-- The name of the field
      "dataType" : "List",  <-- Indicates data type is a list/array
      "@refTypeId" : ,  <-- Numeric reference to an already defined type, which will be populated in the list
    },
    {
      "fieldName": , <-- The name of the field
      "dataType": "List", <-- Indicates data type is a list/array
      "subType": "String" | "Integer" | "Double" | "Date" | "Float" | "Boolean" | "Long" | Object (default) <-- The data type of the elements of a List
      "format": "MM/dd/yyyy hh:mm:ss:SSS " | "#0.00" <-- Optional field to provide a format when dealing with dates of doubles
    }
}

 

In the below example it shows how to define the type definition for a domain object.

...

Code Block
languagetext
titleExample Type Definition
[
  {
    "@type": "JSON_Document",
    "@typeId": 1,
    "fields": [
      {
        "fieldName": "firstName",
        "dataType": "String"
      },
      {
        "fieldName": "lastName",
        "dataType": "String"
      },
      {
        "fieldName": "age",
        "dataType": "Integer"
      },
      {
        "fieldName": "currentAddress",
        "@refTypeId": 2
      },
      {
        "fieldName": "previousAddresses",
        "dataType": "List",
        "@refTypeId": 2
      },
      {
        "fieldName": "luckyNumber",
        "dataType": "List"
      },
      {
        "fieldName": "dateOfBirth",
        "dataType": "Date",
        "format": "MM/dd/yyyy"
      },
      {
        "fieldName": "lastUpdated",
        "dataType": "Date",
        "format": "MM/dd/yyyy hh:mm:ss:SSS"
      }
    ]
  },
  {
    "@type": "JSON_Document",
    "@typeId": 2,
    "fields": [
      {
        "fieldName": "addressLine1",
        "dataType": "String"
      },
      {
        "fieldName": "addressLine2",
        "dataType": "String"
      },
      {
        "fieldName": "addressLine3",
        "dataType": "String"
      },
      {
        "fieldName": "state",
        "dataType": "String"
      },
      {
        "fieldName": "zipCode",
        "dataType": "String"
      },
      {
        "fieldName": "country",
        "dataType": "String"
      }
    ]
  }
]

Example JSON document with typeIds

Code Block
{
  "@typeId": 1,
  "firstName": "John",
  "surname": "Doe",
  "currentAddress": {
    "@typeId": 2,
    "addressLine1": "Suite 200",
    "addressLine2": "1235 South Rd",
    "addressLine3": null,
    "state": "OH",
    "zipCode": "287233",
    "country": "USA"
  }
}

In the above example, the type definitions from the Custom External Type Definition Proposal for JSON exampleTypeDefinition above are used. Not all the fields have been populated that are defined for the type for typeId=1. In the previous implementation, the above example JSON document would have caused a new type to have been generated. In the proposed solution, the type registry was informed that this JSON object belonged to the type defined by typeId=1 and was mapped accordingly.

...