Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Switch to a newer scheme

Table of Contents

Background

Protocol

We've been writing a new client-server protocol as a public API for the creation of new Geode clients. We settled on using Protobuf as the encoding for the protocol as it makes writing clients easier by abstracting away a lot of the encoding details.

...

Regardless of proposal, we should allow users to have a pluggable object encoding that they can register a handler with on the server. This encoder will receive a byte array and return an Object. This allows users to do custom serialization if desired.

...

Protobuf Struct Encoding

...

and Extension

This section is intended as useful background on the start of some thoughts about encoding a PDX-like type with Protobuf; for the proposed encoding, see "The Proposed Encoding", below.

Below are presented two encodings. The first is the simpler option and the second the more complex. The difference is that the second is optimized to avoid sending field names every time an object is serialized, which is done by caching metadata using type registration. We could potentially support both, but given that JSON is already the easy option, and since type registration is per-connection, caching classes on the client side should not be a big burden. The recommendation is for Option 2. Option 1 below is included mostly to make the motivation for Option 2 clearer.

Option 1: Struct encoding

Protobuf ships with a Struct message, defined in struct.proto, that can be recursively nested to encode JSON.

...

The typeName field can be used for other clients to recognize the same type. Internally, it will be stored in the PDXInstance that this is converted to, but that detail shouldn't need to be exposed to the driver developer.

So for example, given the following class and value:

...

Ideally, a driver developer would provide annotations or registration for application developers to register their typesspecify the manner in which a type should be serialized. In languages that use setters and getters by convention, it would probably be more idiomatic to refer to setters getters for reflection rather than the member variables of the object.

...

The Proposed Encoding

As an optimization to the "Struct" encoding, we can caching metadata using type registration. This encoding

Type registration

As an optimization to avoid sending field names with every message, allow clients to register types to communicate the metadata for data they are about to send. The server will give back an ID for that datatype, and the ID can be used in future messages to refer to the metadata without retransmitting that metadata. This encoding will not actually be smaller for single values, but if multiple values of the same type are sent the savings can be significant.

Type registration will be per-connection (meaning IDs cannot be cached between connections). This eliminates the need to keep synchronization on the server, as well as decoupling type registrations from the internal details of PDX. It also means that the drivers only have to keep track of a relatively small amount of data.

It will be safe for a driver to register the same type multiple times on a single connection, and it should get back the same ID every time.

The outline of type registration for the client is this:

...

Code Block
languagejava
collapsetrue
class User {
  String name;
  int age;
}

value = new User("Amy", 64);

Suppose the client would send the following messages to register the type (definitions below)chose ID 42 for this type. Then the first put message using such a value would have a value like so:

Code Block
TypeRegistrationRequestPutRequest{
  typeDefinition  key: ValueTypeDefinition{
EncodedValue{intValue: 12},
     typeName: "User",
  EncodedValue{
  definition: [
     newStructType: ValueTypeFieldDefinitionNewStructType{
           fieldName typename: "nameUser",
        fieldType: stringField
   typeID:   }42,
      ValueTypeFieldDefinition{
      fieldNames:  fieldName: ["name", "age"],
          fieldType: intField  fieldValues: ["Amy", "42"]
        },
    ]
  }
}

which might get a `TypeRegistrationResponse` with an ID of 42.

A later PutRequest  would encode the value like this This could then be used in a PutRequest in this sort of a manner (enclosing Request omitted for succinctness):

Code Block
GetRequestPutRequest{
  key: EncodedValue{intValue: 111},
  value: EncodedValue{
    structValuestructById: ValueStructById{
      id: 42,
      fields: [
        ValueField{stringField: "Amy"},
        ValueField{intField: 64}
      ]
    }
  }
}

Message Definitions 
Anchor
message-definitions
message-definitions

For registering a type definition will look like thisThis is the proposed EncodedValue message that will contain values a client sends to the server or the server sends to the client:

Code Block
titleProtobuf Type registration
linenumberstrue
collapsetrue
message TypeRegistrationRequestEntry {
  ValueTypeDefinition  EncodedValue typeDefinitionkey = 1;
}
message TypeRegistrationResponse {
  intEncodedValue typeIDvalue = 12;
}

message ValueTypeDefinition EncodedValue {
    oneof value{
     string typeName   // primitives
        int32 intResult = 1;
  repeated ValueTypeFieldDefinition definition      int64 longResult = 2;
}
message ValueTypeFieldDefinition {
  enum FieldType {        int32 shortResult = 3;
        int32 byteResult = 4;
        bool booleanResult = 5;
    intField    double doubleResult = 6;
        float longFieldfloatResult = 7;
        bytes shortFieldbinaryResult = 8;
        string stringResult byteField= 9;
         booleanField;
google.protobuf.NullValue nullResult = 11;
        NewStruct doubleFieldnewStruct = 12;
        StructByID floatField;
    binaryField;
    stringField;
  }
  string fieldName = 1;
  FieldType fieldType = 2;
 }

and for sending values:

Code Block
languagetext
titleProtobut Values
linenumberstrue
collapsetrue
message Value {
  int typeID = 1;
  repeated ValueField fields = 2;
}
message ValueField = {
  oneof value {
    int32 intField = 1;
    int64 longField = 2;
    int32 shortField = 3;
    byte byteField = 4;
    bool booleanField = 5;
    double doubleField = 6;
    float floatField = 7;
    bytes binaryField = 8;
    string stringField = 9;
    google.protobuf.NullValue nullField = 11;
  }
}

The client sends a registration request, and the server can determine the typeID.

If a server sends back a value of a type a client has not registered, the client can send a TypeDefinitionLookupRequest:

Code Block
message TypeDefinitionLookupRequest {
  int typeId = 1;
}
message TypeDefinitionLookupResponse {
  int typeId = 1;
  string fieldName = 2;
  ValueTypeDefinition typeDefinition = 3;
}

This way a client can implement logic to find the correct type and deserialize the value.

Considerations

structById = 13;

        // Result serialized using a custom serialization format. This can only be used if
        // A HandshakeRequest is sent with valueFormat set to a valid format.
        //
        // See HandshakeRequest.valueFormat.
        bytes customObjectResult = 14;

        // Collections
        List listResult = 15;
        Map mapResult = 16;

        // Primitive arrays
        NumericArray intArray = 17;
        NumericArray longArray = 18;
        NumericArray shortArray = 19;
        NumericArray booleanArray = 20;
        ByteArrayArray byteArrayArray = 21;
        ObjectArray  objectArray = 22;

        // Used in NewStruct messages for defining fields that can be of multiple types.
        // This encoded value will contain the actual type of the field but the type
        // definition will have Object for the field type.
        EncodedValue objectField = 23;
    }
}

message NewStruct {
    string typename = 1;
    int32 typeID = 2;
    repeated string fieldNames = 3;
    repeated EncodedValue fields = 4;
}

message StructByID {
    int32 typeID = 1;
    repeated EncodedValue fields = 2;
}

message List {
    repeated EncodedValue elements = 1;
}

message Map {
    repeated Entry entries = 1;
}

// All numeric values in Protobuf are encoded using the same varint encoding,
// so this encodes identically for all numbers and booleans.
message NumericArray {
    repeated int64 elements = 1;
}

message ByteArrayArray {
    repeated bytes arrays = 1;
}

message ObjectArray {
    repeated EncodedValue objects = 1;
}

Under this EncodedValue scheme, types defined by the server and types defined by the client will use different sets of IDs (though these can refer to the same cached values if they are the same). This is because we intend to add support for asynchronous messages and/or multiplexing of multiple channels of communication over one socket, and this avoids having the server and client race to assign IDs. If IDs were shared, the server would need to send back new IDs when it sent back types the client had not seen before.

The Object field is for fields that may be an Integer, String or Array type but have a broader type on the server side. Structs are viewed as Object type – more complex typing is not present. This is, in significant part, a leaky abstraction due to the way PDX saves values.

If a client is sending mutually recursive types or types that contain instances of themselves, it should send the type definition the first time one is seen (or in the parent instance) and send the type with ID in each later instance.

Whether a client must send all following values by ID or the values can be sent with a full ID each time

Considerations

In order to avoid arbitrary object serialization (which can lead to gadget chain exploits), we will probably need to constrain valid types to those registered as DataSerializable, or possibly even only those registered with the ReflectionBasedAutoSerializer. This may also mean that we need a special class of typenames for those types that are put first by a client.

A driver developer may wish to provide a way for users to register types before sending values.

Driver developers will have to make sure that types they want to use in different language clients can be correlated. So package names may or may not make sense. The naming convention is not entirely decided, nor is whether we can register nameless types. It may be wise to reserve a set of names with special meaning ("JSON" perhaps?) and perhaps a set of names that would correspond to classes that have no domain class in Java (leading underscore, or just those with no package name?)

If a server sends back a value of a type a client has not registered, the client can send a TypeDefinitionLookupRequest.

The use of NumericArray for all the integral types is because they all have the same varint encoding and will encode the same way on the wire. It may be advisable to use more restricted types and separate messages to get better typing in the generated Protobuf code.