Geode Compatibility with Redis data sharding and cluster changes
To be Reviewed By:
Authors: Jens Deppe
Status: Draft | Discussion | Active | Dropped | Superseded
Superseded by: N/A
Related: N/A
Problem
In its initial implementation, the Geode compatibility with Redis API provides a hybrid clustering mode whereby clients can connect to any node and interact as with a standalone Redis server. This, however, does not provide true clustering as it does not expose the functionality for existing cluster-capable clients to make use of data sharding and high availability. This proposal aims to provide that support.
Anti-Goals
We will not provide the ability to operate in either a pure cluster mode or hybrid mode. Cluster mode will be the only mode. (This would not preclude non-cluster clients capable of interpreting MOVED
responses from working).
Solution
Redis enables data sharding by partitioning data into 16384 slots. Keys are hashed using a well defined hashing algorithm (CRC16/XMODEM) the result of which is then modded with 16384 to determine the slot. Each primary server is responsible for hosting a non-overlapping set of slots. Various cluster commands provide information on slot-to-server allocation. Thus, given a key, a client is able to determine which server is hosting that data and direct the command to the correct server. See also Redis Cluster Specification.
Geode already has a data sharding concept in the form of buckets with the ability to map keys to buckets using a custom PartitionResolver
. In order to map Redis' concept of slots to Geode's buckets we can perform the following:
// Calculate the CRC16 of a given key using the CRC16/XMODEM hashing algorithm (this produces a short unsigned integer) crc16Hash = CRC16(key) // A custom PartitionResolver can then compute the correct bucket with: slot = crc16Hash % 16384 bucket = int(slot / (16384 / BUCKETS_IN_REGION))
Ideally the number of buckets in the region would be a power of 2 (or factor of 16384) so that data is allocated equally. The current default bucket size of 113 would cause a slight imbalance of data so we should use 128 as the default instead.
In addition, at least the following cluster-specifc commands will be implemented in order to enable clients to take advantage of clustering.
CLUSTER NODES
- this command produces a list identifying each member of the cluster, indicating whether it is a primary or secondary and listing the slots which it hosts - https://redis.io/commands/cluster-nodes07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1:30004@31004 slave e7d1eecce10fd6bb5eb35b9f99a514335d9ba9ca 0 1426238317239 4 connected 67ed2db8d677e59ec4a4cefb06858cf2a1a89fa1 127.0.0.1:30002@31002 master - 0 1426238316232 2 connected 5461-10922 292f8b365bb7edb5e285caf0b7e6ddc7265d2f4f 127.0.0.1:30003@31003 master - 0 1426238318243 3 connected 10923-16383 6ec23923021cf3ffec47632106199cb7f496ce01 127.0.0.1:30005@31005 slave 67ed2db8d677e59ec4a4cefb06858cf2a1a89fa1 0 1426238316232 5 connected 824fe116063bc5fcf9f4ffd895bc17aee7731ac3 127.0.0.1:30006@31006 slave 292f8b365bb7edb5e285caf0b7e6ddc7265d2f4f 0 1426238317741 6 connected e7d1eecce10fd6bb5eb35b9f99a514335d9ba9ca 127.0.0.1:30001@31001 myself,master - 0 0 1 connected 0-5460
Since Geode members host both primary and secondary buckets, each member can be designated as a primary, (specifically as 'master' in terms of the above output), and no members will be designated as 'slave'.
CLUSTER SLOTS
- this command produces structured array output with slot information similar to CLUSTER NODES:1) 1) (integer) 0 2) (integer) 5460 3) 1) "127.0.0.1" 2) (integer) 30001 4) 1) "127.0.0.1" 2) (integer) 30006 2) 1) (integer) 5461 2) (integer) 10922 3) 1) "127.0.0.1" 2) (integer) 30002 4) 1) "127.0.0.1" 2) (integer) 30004 3) 1) (integer) 10923 2) (integer) 16383 3) 1) "127.0.0.1" 2) (integer) 30003 4) 1) "127.0.0.1" 2) (integer) 30005
If a client makes a request to a server that is not hosting the given key, the server will need to respond with a -MOVED
error indicating which server is hosting the key.
Although not currently implemented, this change will greatly ease the ability to develop transactional support using Redis' MULTI/EXEC
commands. Redis transactions require all keys, participating in a transaction, to be colocated in the same slot. The current implementation would require a much more complicated locking scheme allowing multiple keys to be locked across multiple members.
Changes and Additions to Public Interfaces
This change will remove the ability for non-cluster aware clients to interact with the data unless only a single member is used. This is a change from the current implementation which allows clients to connect to any server without needing any knowledge of data locality.
Performance Impact
This change will have a significant positive performance impact since requests will be targeted directly at servers hosting the data. Thus requests will not incur the penalty of an additional network hop to reach the data (this is analogous to Geode's 'single-hop' concept).
Backwards Compatibility and Upgrade Path
Applications deployed against the current functionality in Geode 1.14 will need to be updated to be cluster aware. Typically this should only require a different form of client initialization.
Prior Art
N/A