Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


IDIEP-
[NUMBER]
54
Author
Sponsor
Created

  

Status
Status
colourGrey
titleDRAFT


Table of Contents

Motivation

// Define the problem to be solved.

Description

// Provide the design of the solution.

Risks and Assumptions

// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

// Links to various reference documents, if applicable.

Tickets

The way Ignite works with data schemas is inconsistent:

  • The binary protocol creates schemas for anything that is serialized. These schemas are updated implicitly – the user doesn't have any control over them.
  • SQL engine has its own schema that is separate from the binary schema, although SQL runs on top of binary objects. SQL schema is created and updated explicitly by the user.
  • Caches themselves are basically schemaless – you're allowed to store multiple versions of multiple data types in a single cache.

This creates multiple usability issues:

  • SQL schema can be inconsistent or even incompatible with the binary schema. If either of them is updated, the second is not affected.
  • SQL can't be used by default. The user has to explicitly create the SQL schema, listing all fields and making sure the list is consistent with the content of binary objects.
  • Binary schemas are decoupled from caches. So, if a cache is destroyed, the binary schema is not removed.
  • Etc.

Description

The general idea is to have a one-to-one mapping between data schemas and caches/tables. There is a single unified schema for every cache, it is applied to both data storage itself and to the SQL.

When a cache is created, it is configured with a corresponding data schema. There must be an API and a tool to see the current version of the schema for any cache, as well as make updates to it. Schema updates are applied dynamically without downtime.

DDL should work on top of this API providing similar functionality. E.g. CREATE TABLE invocation translates to a cache creation with the schema described in the statement.

Anything stored in a cache/table must be compliant with the current schema. An attempt to store incompatible data should fail.

The binary protocol should be used only as the data storage format. All serialization that happens for communication only should be performed by a different protocol. The data storage format will be coupled with the schemas, while the communication is independent of them. As a bonus, this will likely allow for multiple optimizations on both sides, as serialization protocols will become more narrow purposed.

BinaryObject API should be reworked, as it will not represent actual serialized objects anymore. It should be replaced with something like BinaryRecord or DataRecord representing a record in a cache or table. Similarly to the current binary objects, records will provide access to individual fields. A record can also be deserialized into a class with any subset of fields represented in the record.

Risks and Assumptions

n/a

Discussion Links

TBD

Reference Links

n/a

Tickets

n/a// Links or report with relevant JIRA tickets.