Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Background

Protocol

We've been writing a new client-server protocol as a public API for the creation of new Geode clients. We settled on using Protobuf as the encoding for the protocol as it makes writing clients easier by abstracting away a lot of the encoding details.

Encoding

One of the big challenges in designing a new protocol has been how to encode values. Like the old binary client protocol, the PDX encoding is complicated, underdocumented, and stateful. However, we need a way for users to send values that are more complex than mere primitives or maps of primitives..

The first approach was to use the JSON-PDX conversion that is already used for the REST API. Many languages have libraries to encode objects as JSON, and it's a familiar format to many developers. However, using JSON for encoding has downsides, in that it's large and slow.

Selecting an Encoding Mechanism

Adding this encoding mechanism to the protocol is not exclusive with allowing JSON or custom encodings – the object encoding can be pluggable, and the user's desired response encoding should be selected during the handshake process.

Regardless of proposal, we should allow users to have a pluggable object encoding that they can register a handler with on the server. This encoder will receive a byte array and return an Object. This allows users to do custom serialization if desired.

Protobuf Struct Encoding and Extension

This section is intended as useful background on the start of some thoughts about encoding a PDX-like type with Protobuf; for the proposed encoding, see "The Proposed Encoding", below.

...

Ideally, a driver developer would provide annotations or registration for application developers to specify the manner in which a type should be serialized. In languages that use setters and getters by convention, it would probably be more idiomatic to refer to setters getters for reflection rather than the member variables of the object.

The Proposed Encoding

As an optimization to the "Struct" encoding, we can caching metadata using type registration. This encoding

Type registration

As an optimization to avoid sending field names with every message, allow clients (and servers) to cache metadata for data they are about to send. This is done by registering an ID that can be used in future messages to refer to the metadata without retransmitting that metadata. This encoding will not actually be smaller for single values of a type, but if multiple values of the same type are sent the savings can be significant.

...

Code Block
PutRequest{
    key: EncodedValue{intValue: 12},
        EncodedValue{
        newStructType: NewStructType{
            typename: "User",
            typeID: 42,
            fieldNames: ["name", "age"],
            fieldValues: [
                ValueField{stringField: "Amy"}, "42"
                ValueField{intField: 64}
            ]
        }
    }
}

A later PutRequest  would encode the value like this (enclosing Request omitted for succinctness):

Code Block
PutRequest{
  key: EncodedValue{intValue: 111},
  value: EncodedValue{
    structById: StructById{
      id: 42,
      fields: [
        ValueField{stringField: "Amy"},
        ValueField{intField: 64}
      ]
    }
  }
}

Message Definitions 
Anchor
message-definitions
message-definitions

This is the proposed EncodedValue message that will contain values a client sends to the server or the server sends to the client:

...

Whether a client must send all following values by ID or the values can be sent with a full ID each time should be configurable in the handshake.

Considerations

In order to avoid arbitrary object serialization (which can lead to gadget chain exploits), we will probably need to constrain valid types to those registered as DataSerializable, or possibly even only those registered with the ReflectionBasedAutoSerializer. This may also mean that we need a special class of typenames for those types that are put first by a client.

...

The use of NumericArray for all the integral types is because they all have the same varint encoding and will encode the same way on the wire. It may be advisable to use more restricted types and separate messages to get better typing in the generated Protobuf code.

Type Mappings

Each of the primitives maps to the corresponding Java primitive. Arrays map to arrays of Java primitives. Other fields will encode to the corresponding objects.