Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We've been writing a new client-server protocol as a public API for the creation of new Geode clients. We settled on using Protobuf as the encoding for the protocol as it allows a user not to think about a makes writing clients easier by abstracting away a lot of the encoding details, which makes writing clients significantly easier.

Encoding

One of the big challenges in designing a new protocol has been how to encode values. Like the old binary client protocol, the PDX encoding is complicated, underdocumented, and stateful. However, we need a way for users to send values that are more complex than mere primitives or maps of primitives..

The first approach was to use the JSON-PDX conversion that is already used for the REST API. Many languages have libraries to encode objects as JSON, but this and it's a familiar format to many developers. However, using JSON for encoding has downsides, among them being in that it's large and slow.

...

Selecting an Encoding Mechanism

Adding this encoding mechanism to the protocol is not exclusive with allowing JSON or custom encodings – the object encoding can be pluggable, and the user's desired response encoding should be selected during the handshake process.

Regardless of proposal, we should allow users to have a pluggable object encoding that they can register a handler with on the server. This encoder will receive a byte array and return an Object. This allows users to do custom serialization if desired.

The Proposed Encoding

Below are presented two encodings. The first is the simpler option and the second the more complex. The difference is that the second is optimized to avoid sending field names every time an object is serialized, which is done by caching metadata using type registration. We could potentially support both, but given that JSON is already the easy option, and since type registration is per-connection, caching classes on the client side should not be a big burden.  Option 1 below is included mostly to make the motivation for Option 2 clearer.

Option 1: Struct encoding

Protobuf ships with a file, described in struct.proto, that can be recursively nested to encode JSON.

...

Code Block
Struct{
  typeName: "<packagename?>UserUser",
  entries: [
    StructEntry{
      fieldName: "name",
      value: stringValue{"Amy"}
    },
    StructEntry{
      fieldName: "age",
      value: intValue{44}
    }
  ]
}

...

As an optimization to avoid sending field names with every message, allow clients to register types to communicate the metadata for data they are about to send. The server will give back an ID for that datatype, and the ID can be used in future messages to refer to the metadata without retransmitting that metadata. This encoding will not actually be smaller for single values, but if multiple values of the same type are sent the savings can be significant.

Type registration will be per-connection (meaning IDs cannot be cached between connections). This eliminates the need to keep synchronization on the server, as well as decoupling client registrations from the internal details of PDX. It also means that the clients only have to keep track of a relatively small amount of data.

It will be safe for a client to register the same type multiple times on a single connection, and it should get back the same ID every time.

The outline of type registration for the client is this:

  1. Send a type definition
  2. Get back a type ID that references the type description
  3. Use that type ID when encoding values of that type

So for example, using the same User from above:

Code Block
languagejava
collapsetrue
class User {
  String name;
  int age;
}

value = new User("Amy", 44);

the client would send the following messages to register the type

...

(definitions below):

Code Block
TypeRegistrationRequest{
  typeDefinition: ValueTypeDefinition{
    typeName: "User",
    definition: [
      ValueTypeFieldDefinition{
        fieldName: "name",
        fieldType: stringField
      },
      ValueTypeFieldDefinition{
        fieldName: "age",
        fieldType: intField
      },
    ]
  }
}
    

which might get a `TypeRegistrationResponse` with an ID of 42.

This could then be used in a PutRequest in this sort of a manner (enclosing Request omitted for succinctness):

Code Block
GetRequest{
  key: EncodedValue{intValue: 42},
  value: EncodedValue{
    structValue: Value{
      id: 42,
      fields: [
        ValueField{stringField: "Amy"},
        ValueField{intField: 42}
      ]
    }
  }
}

Message Definitions 
Anchor
message-definitions
message-definitions

For registering The message for sending a type definition will look like this:

Code Block
titleProtobuf Type registration
linenumberstrue
collapsetrue
message TypeRegistrationRequest {
  ValueTypeDefinition typeDefinition = 1;
}
message TypeRegistrationResponse {
  int typeID = 1;
}

message ValueTypeDefinition {
  string typeName = 1;
  repeated ValueTypeFieldDefinition definition = 2;
}
message ValueTypeFieldDefinition {
  string fieldName = 1;
  enum FieldType {
    intField;
    longField;
    shortField;
    byteField;
    booleanField;
    doubleField;
    floatField;
    binaryField;
    stringField;
  }
  string fieldName = 1;
 FieldType FieldType fieldType = 2;
 }

and for sending values:

Code Block
languagetext
titleProtobut Values
linenumberstrue
collapsetrue
message Value {
  int typeID = 1;
  repeated ValueField fieldfields = 2;
}
message ValueField = {
  oneof value {
    int32 intField = 1;
    int64 longField = 2;
    int32 shortField = 3;
    byte byteField = 4;
    bool booleanField = 5;
    double doubleField = 6;
    float floatField = 7;
    bytes binaryField = 8;
    string stringField = 9;
    google.protobuf.NullValue nullField = 11;
  }
}

...

Client developers will have to make sure that types they want to use in different language clients can be correlated. So package names may or may not make sense.