Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Requirements

When using more than geode distributed system connected through the WAN, multiple WAN sites with geode users are required to set the distributed-system-id property (DSID). Each site must have its own unique DSID, and ids are limited to being between 1 -and 255. This limits the number of WAN sites to 255, and it . It also requires that external coordination before setting up a WAN site to make it that it gets a DSID. Once data is generated within a site, it is not possible to change the DSID without exporting and reimporting the data.

...

  • Allow more than 255 WAN sites
  • Allow Be able to do a rolling upgrade to the new systema version that supports more that 255 WAN sites
  • Nice to have - reduce coordination required when setting up a new WAN site. Do we have to specify and ID at all?

Not in scope:

  • Support old clients, peers on WAN sites while also having large DSIDs. Before using larger IDs users will have to upgrade everything.

Background

The distributed-system-id is used for three things: WAN receiver discovery, WAN conflict detection and PDX type ID generation.

...

When creating a gateway-sender , users specify the remote-distributed-system-id property. That property , which indicates which WAN site the sender should send to. Users configure the remote-locators property on their locator to point to locators in other WAN sites. Geode then will discover the DSID and addresses of remote gateway-receivers and connect the sender to those receivers.

...

Because WAN is asynchronous it's possible for two WAN sites to modify the same entry at the same time. For this case geode uses an eventual consistency model where all sites should end up choosing the same modification as the final value.

In order to detect that two modifications to the same entry came there is a conflict between updates from two different WAN sites geode compares the DSID of the two modifications. If the DSIDS are different , geode then performs WAN specific conflict resolution. By default that uses uses timestamps but can also be user code - see GateConflictResolver. In GateConflictResolver, dsid is an int, which involves comparing timestamps or invoking a user conflict resolver.

Geode passes the distributed-system-id along with events when doing a put over the WAN see . See VersionTag.toData and GatewaySenderGFEBatchOpImpl for a couple of places where this is sent over the wire.

Pdx Type ID Generation

When serializing a an object with PDX , geode generates a PDXType which describes how to deserialize the object. It embeds and id Geode generates a unique ID for that type (PDXID) into the serialized data. This avoid having to include metadata like field names and embeds the ID in the serialized data. Geode stores the PDXID to PdxType mapping The PdxType is stored in the PDX type registry , which is available everywhere, so that all members can find the PDXType using the PDXID.

In order to generate unique PDXIDs each WAN site embeds the DSID in PDXIDs generated in that site. That way types generated in different ensures two WAN sites never conflictgenerate the same PDXID. PDXIDs are 4 bytes, where one . The high byte is the DSID and the remaining 3 bytes are generated under a lock within a given WAN site.

...

Most of the public APIs treat DSID as an integer. So we could just increase the size of the DSID to be an integer. We probably ought to just make the typeID an integer at the same type. This would add at least an extra 4 3 bytes of overhead to every update message and to every entry in the system, since DSIDs are passed around and stored for WAN Conflict Detection. It would also add an extra 4 3 bytes for every serialized object. Because values stored in geode are often object graphs and not single objects, that means that a single serialized value may have a much larger overhead increase than 4 3 bytes.

Option 2 - Change the ratio of DSID to other bytes in the PDXID

The PDXID currently allocates 1 byte for the DSID and 3 bytes for the unique ID of the type. Although some people end up generating lots of types, many customers probably don't have more than a few hundred types. We could give the DSID an extra byte without changing the size of the PDXID. This would allow 64K DSIDs and 64K types instead.

Since the DSID is larger update messages and entries will need an extra byte of overhead, but serialized values would stay the same size.

Unfortunately , there is a lot of existing data out there that has a 4 byte PDXID that was generated with the old format. So this proposal would probably still require all of the rolling upgrade work to translate between formats that the other options require, even though the number of bytes are the same.

Option 3 - Use string IDs to identify WAN sites, switch to a 16 byte hash for the PDXID

This is the "if space is cheap" option. If we are willing to take on a bunch of extra overhead , we could get some usability benefits by getting rid of the integer dsid entirely. Here's one way that might look:

  1. Change the id used for WAN discovery to a string. Since this is really just a label for the remote WAN site , users should be able to put in whatever string they want , and change it whenever they want.
  2. Generate a UUID to identify WAN sites for conflict detection. This would also allow us to detect that two changes came from different WAN sites , but doesn't require without requiring the user to select a unique IDan id. This would increase the overhead of update messages and the storage space for every entry by 15 bytes.
  3. Using a cryptographic hash function, generate the ID of a PdxType by taking a 128 bit hash of the PDX type. This would effectively guarantee that two types don't have the same ID, while also guaranteeing that the same PdxType always gets the same ID. The fringe benefit of the last part is that maintaining the PDX type registry becomes much less complicated, and recovering from failures where we somehow lost the PdxType registry becomes much easier. That's because if we lose the type registry, it would be easy to regenerate it from the user's class definitions. This would however add 12 bytes of overhead to every serialized objectWith this solution we can generate a unique id for the type without any need to include a DSID or even do all of the complicated locking we do now. By using a hash instead of UUID we get the benefit that we can regenerate the type registry from a users domain classes if the registry gets lost somehow.

Rolling Upgrades

Rolling upgrade concerns with WAN receiver discovery and conflict detection

For receiver discovery and conflict detection, we are mostly just talking about changing have to change the size of the dsid DSID that is passed around in various messages - In the VersionTag, in GatewaySenderGFEBatchOpImpl, etc, as well as changing the format that values are written on to disk to include a larger DSID. We could support a rolling upgrade by looking at the receiver version and sending the DSID as a byte using the old format if the receiver has an older version. Similarly with disk files each file has a version, so we could know if we are reading from an older version file and expect the DSID to be a byte. We also have to change the size of the DSID written to disk. When sending messages to old members or reading from old disk files we would need to translate the ID for all of these messages.

If a newer member is trying to send a DSID larger than a byte to an older member, it should throw an exception. Truncating the data is just doing to result incorrect behavior.

This means that a user would be able to upgrade all of the WAN sites/clients, etc. while still using the small DSIDs. Once that is done they could start introducing sites with larger DSIDs.

 

Rolling upgrade concerns with the PDX Type ID

There are special concerns with changing the format of anything that is stored as a region value, because we pass around and persist values as opaque byte arrays, and we . We don't keep track of the version of the member that generated the byte array.

One option for upgrading the type ID would be introduce a new DataSerializer code for PDX values with larger ids. PDX may be more properly concerned is really a subformat of DataSerializable. When a user stores a PDX object it is serialized using DataSerializable.writeObject. The actual bytes generated look something like this

...