You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Requirements

When using more than geode distributed system connected through the WAN, users are required to set the distributed-system-id property (DSID). Each site must have its own unique DSID, and ids are limited to being between 1-255. This limits the number of WAN sites to 255, and it also requires that external coordination before setting up a WAN site to make it that it gets a DSID. Once data is generated within a site, it is not possible to change the DSID without exporting and reimporting the data.

Goals:

  • Allow more than 255 WAN sites
  • Allow a rolling upgrade to the new system
  • Nice to have - reduce coordination required when setting up a new WAN site. Do we have to specify and ID at all?

Background

The distributed-system-id is used for three things: WAN receiver discovery, WAN conflict detection and PDX type ID generation.

WAN receiver discovery

When creating a gateway-sender, users specify the remote-distributed-system-id property, which indicates which WAN site the sender should send to. Users configure the remote-locators property on their locator to point to locators in other WAN sites. Geode then will discover the DSID and addresses of remote gateway-receivers and connect the sender to those receivers.

WAN Conflict Detection

Because WAN is asynchronous it's possible for two WAN sites to modify the same entry at the same time. For this case geode uses an eventual consistency model where all sites should end up choosing the same modification as the final value.

In order to detect that two modifications to the same entry came from two different WAN sites geode compares the DSID of the two modifications. If the DSIDS are different, geode then performs conflict resolution. By default that uses uses timestamps but can also be user code - see GateConflictResolver. In GateConflictResolver, dsid is an int.

Geode passes the distributed-system-id along with events when doing a put over the WAN see VersionTag.toData and GatewaySenderGFEBatchOpImpl for a couple of places where this is sent over the wire.

Pdx Type ID Generation

When serializing a object with PDX, geode generates a PDXType which describes how to deserialize the object. It embeds and id for that type (PDXID) into the serialized data. This avoid having to include metadata like field names in the serialized data. Geode stores the PDXID to PdxType mapping in the PDX type registry, which is available everywhere.

In order to generate unique PDXIDs each WAN site embeds the DSID in PDXIDs generated in that site. That way types generated in different WAN sites never conflict. PDXIDs are 4 bytes, where one byte is the DSID and the remaining 3 bytes are generated under a lock within a given WAN site.

Proposals

Option 1 - Just make the ID bigger

Most of the public APIs treat DSID as an integer. So we could just increase the size of the DSID to be an integer. We probably ought to just make the typeID an integer at the same type. This would add at least an extra 4 bytes of overhead to every update message and to every entry in the system, since DSIDs are passed around and stored for WAN Conflict Detection. It would also add an extra 4 bytes for every serialized object. Because values stored in geode are often object graphs and not single objects, that means that a single value may have a much larger overhead increase than 4 bytes.

Option 2 - Change the ratio of DSID to other bytes in the PDXID

The PDXID currently allocates 1 byte for the DSID and 3 bytes for the unique ID of the type. Although some people end up generating lots of types, many customers probably don't have more than a few hundred types. We could give the DSID an extra byte without changing the size of the PDXID. This would allow 64K DSIDs and 64K types instead.

Unfortunately, there is a lot of existing data out there that has a 4 byte PDXID that was generated with the old format. So this proposal would probably still require all of the rolling upgrade work to translate between formats that the other options require, even though the number of bytes are the same.

Option 3 - Use string IDs to identify WAN sites, switch to a 16 byte hash for the PDXID

This is the "if space is cheap" option. If we are willing to take on a bunch of extra overhead, we could get some usability benefits by getting rid of dsid entirely. Here's one way that might look:

  1. Change the id used for WAN discovery to a string. Since this is really just a label for the remote WAN site, users should be able to put in whatever string they want, and change it whenever they want.
  2. Generate a UUID to identify WAN sites for conflict detection. This would also us to detect that two changes came from different WAN sites, but doesn't require the user to select a unique ID. This would increase the overhead of update messages and the storage space for every entry by 15 bytes.
  3. Using a cryptographic hash function, generate the ID of a PdxType by taking a 128 bit hash of the PDX type. This would effectively guarantee that two types don't have the same ID, while also guaranteeing that the same PdxType always gets the same ID. The fringe benefit of the last part is that maintaining the PDX type registry becomes much less complicated, and recovering from failures where we somehow lost the PdxType registry becomes much easier. That's because if we lose the type registry, it would be easy to regenerate it from the user's class definitions. This would however add 12 bytes of overhead to every serialized object.

Rolling Upgrades

Rolling upgrade concerns with WAN receiver discovery and conflict detection

For receiver discovery and conflict detection, we are mostly just talking about changing the size of the dsid that is passed around in various messages - In the VersionTag, in GatewaySenderGFEBatchOpImpl, etc, as well as changing the format that values are written on to disk to include a larger DSID. We could support a rolling upgrade by looking at the receiver version and sending the DSID as a byte using the old format if the receiver has an older version. Similarly with disk files each file has a version, so we could know if we are reading from an older version file and expect the DSID to be a byte.

If a newer member is trying to send a DSID larger than a byte to an older member, it should throw an exception. Truncating the data is just doing to result incorrect behavior.

This means that a user would be able to upgrade all of the WAN sites/clients, etc. while still using the small DSIDs. Once that is done they could start introducing sites with larger DSIDs.

 

Rolling upgrade concerns with the PDX Type ID

There are special concerns with changing the format of anything that is stored as a region value, because we pass around and persist values as opaque byte arrays, and we don't keep track of the version of the member that generated the byte array.

One option for upgrading the type ID would be introduce a new DataSerializer code for PDX values with larger ids. PDX may be more properly concerned a subformat of DataSerializable. When a user stores a PDX object it is serialized using DataSerializable.writeObject. The actual bytes generated look something like this

93PDXID

pdx bytes

The leading 93 is the DataSerializable constant that indicates that the following bytes are PDX (See DSCODE.PDX). So we could introduce a new code like DSCODE.PDX_2 that indicates PDX bytes with a different ID format. New members could use the old code if the PDXID is less than 255, and the new code if it is larger. In that way a user would be able to upgrade all of their sites, clients, etc. before introducing members with larger DSIDs.

If an older member was passed a value that was serialized using DSCODE.PDX_2, it would just fail to deserialize the value.

  • No labels