Geode Compatibility with Redis data sharding and cluster changes

To be Reviewed By:

Authors: Jens Deppe

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem

In its initial implementation, the Geode compatibility with Redis API provides a hybrid clustering mode whereby clients can connect to any node and interact as with a standalone Redis server. This, however, does not provide true clustering as it does not expose the functionality for existing cluster-capable clients to make use of data sharding and high availability. This proposal aims to provide that support. In addition, the current approach is not very performant as any given key access may require multiple network hops.

Anti-Goals

We will not provide the ability to operate in either a pure cluster mode or hybrid mode. Cluster mode will be the only mode.

Solution

Redis enables data sharding by partitioning data into 16384 slots. Keys are hashed using a well defined hashing algorithm (CRC16/XMODEM) the result of which is then modded with 16384 to determine the slot. Each primary server is responsible for hosting a non-overlapping set of slots. Various cluster commands provide information on slot-to-server allocation. Thus, given a key, a client is able to determine which server is hosting that data and direct the command to the correct server. See also Redis Cluster Specification.

Geode already has a data sharding concept in the form of buckets with the ability to map keys to buckets using a custom PartitionResolver . In order to map Redis' concept of slots to Geode's buckets we can perform the following:

// Calculate the CRC16 of a given key using the CRC16/XMODEM hashing algorithm (this produces a short unsigned integer)
crc16Hash = CRC16(key)

// A custom PartitionResolver can then compute the correct bucket with:
slot = crc16Hash % 16384
bucket = int(slot / (16384 / BUCKETS_IN_REGION))

Ideally the number of buckets in the region would be a power of 2 (or factor of 16384) so that data is allocated equally. The current default bucket size of 113 would cause a slight imbalance of data so we should use 128 as the default instead.

In addition, at least the following cluster-specifc commands will be implemented in order to enable clients to take advantage of clustering.

CLUSTER NODES - this command produces a list identifying each member of the cluster, indicating whether it is a primary or replica and listing the slots which it hosts. Since Geode members host both primary and replica buckets, each member can only be designated as a primary. Below is an example of the proposed output using the literals required by the Redis API.

07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1:30004@31004 master - 0 1426238317239 4 connected 7500-10922
67ed2db8d677e59ec4a4cefb06858cf2a1a89fa1 127.0.0.1:30002@31002 master - 0 1426238316232 2 connected 5461-7499
292f8b365bb7edb5e285caf0b7e6ddc7265d2f4f 127.0.0.1:30003@31003 master - 0 1426238318243 3 connected 10923-12999
6ec23923021cf3ffec47632106199cb7f496ce01 127.0.0.1:30005@31005 master - 0 1426238316232 5 connected 13000-16383
824fe116063bc5fcf9f4ffd895bc17aee7731ac3 127.0.0.1:30006@31006 master - 0 1426238317741 6 connected 2500-5460
e7d1eecce10fd6bb5eb35b9f99a514335d9ba9ca 127.0.0.1:30001@31001 myself,master - 0 0 1 connected 0-2499

CLUSTER SLOTS - this command produces structured array output with slot information similar to CLUSTER NODES, including only the primary nodes.

1) 1) (integer) 0
   2) (integer) 2499
   3) 1) "127.0.0.1"
      2) (integer) 30001
2) 1) (integer) 2500
   2) (integer) 5460
   3) 1) "127.0.0.1"
      2) (integer) 30006
3) 1) (integer) 5461
   2) (integer) 7499
   3) 1) "127.0.0.1"
      2) (integer) 30002
4) 1) (integer) 7500
   2) (integer) 10922
   3) 1) "127.0.0.1"
      2) (integer) 30004
5) 1) (integer) 10923
   2) (integer) 12999
   3) 1) "127.0.0.1"
      2) (integer) 30003
6) 1) (integer) 13000
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 30005

-MOVED - If a client makes a request to a server that is not hosting the given key, the server will need to respond with a -MOVED error indicating which server is hosting the key. The current implementation avoids this by using a function call to route the request to the member hosting the key. With this proposal, the need for this layer of indirection will be removed and will improve performance.

Although not currently implemented, this change will greatly ease the ability to develop transactional support using Redis' MULTI/EXEC commands. Redis transactions require all keys, participating in a transaction, to be colocated in the same slot. The current implementation would require a much more complicated locking scheme allowing multiple keys to be locked across multiple members.

Additional CLUSTER commands may also be implemented in the future - for example CLUSTER REPLICAS and CLUSTER INFO .

Changes and Additions to Public Interfaces

This change will remove the ability for non-cluster aware clients to interact with the data unless only a single member is used. This is a change from the current implementation which allows clients to connect to any server without needing any knowledge of data locality.

Performance Impact

This change will have a significant positive performance impact since requests will be targeted directly at servers hosting the data. Thus requests will not incur the penalty of an additional network hop to reach the data (this is analogous to Geode's 'single-hop' concept).

Backwards Compatibility and Upgrade Path

Applications deployed against the current functionality in Geode 1.14 will need to be updated to be cluster aware. Typically this should only require a different form of client initialization.

Prior Art

N/A

Space shortcuts

Page tree

Geode Compatibility with Redis data sharding and cluster changes

Problem

Anti-Goals

Solution

Changes and Additions to Public Interfaces

Performance Impact

Backwards Compatibility and Upgrade Path

Prior Art

FAQ

Errata

6 Comments

Diane Hardman

Jens Deppe

Anilkumar Gingade

Jacob Barrett

Anilkumar Gingade

Jens Deppe