You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Geode Compatibility with Redis data sharding and cluster changes

To be Reviewed By:

Authors: Jens Deppe

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem

In its initial implementation, the Geode compatibility with Redis API provides a hybrid clustering mode whereby clients can connect to any node and interact as with a standalone Redis server. This, however, does not provide true clustering as it does not expose the functionality for existing cluster-capable clients to make use of data sharding and high availability. This proposal aims to provide that support. In addition, the current approach is not very performant as any given key access may require multiple network hops.

Anti-Goals

We will not provide the ability to operate in either a pure cluster mode or hybrid mode. Cluster mode will be the only mode. (This would not preclude non-cluster clients capable of interpreting MOVED responses from working).

Solution

Redis enables data sharding by partitioning data into 16384 slots. Keys are hashed using a well defined hashing algorithm (CRC16/XMODEM) the result of which is then modded with 16384 to determine the slot. Each primary server is responsible for hosting a non-overlapping set of slots. Various cluster commands provide information on slot-to-server allocation. Thus, given a key, a client is able to determine which server is hosting that data and direct the command to the correct server. See also Redis Cluster Specification.

Geode already has a data sharding concept in the form of buckets with the ability to map keys to buckets using a custom PartitionResolver . In order to map Redis' concept of slots to Geode's buckets we can perform the following:

// Calculate the CRC16 of a given key using the CRC16/XMODEM hashing algorithm (this produces a short unsigned integer)
crc16Hash = CRC16(key)

// A custom PartitionResolver can then compute the correct bucket with:
slot = crc16Hash % 16384
bucket = int(slot / (16384 / BUCKETS_IN_REGION))

Ideally the number of buckets in the region would be a power of 2 (or factor of 16384) so that data is allocated equally. The current default bucket size of 113 would cause a slight imbalance of data so we should use 128 as the default instead.

In addition, at least the following cluster-specifc commands will be implemented in order to enable clients to take advantage of clustering.

  • CLUSTER NODES - this command produces a list identifying each member of the cluster, indicating whether it is a primary or secondary and listing the slots which it hosts - https://redis.io/commands/cluster-nodes

    07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1:30004@31004 slave e7d1eecce10fd6bb5eb35b9f99a514335d9ba9ca 0 1426238317239 4 connected
    67ed2db8d677e59ec4a4cefb06858cf2a1a89fa1 127.0.0.1:30002@31002 master - 0 1426238316232 2 connected 5461-10922
    292f8b365bb7edb5e285caf0b7e6ddc7265d2f4f 127.0.0.1:30003@31003 master - 0 1426238318243 3 connected 10923-16383
    6ec23923021cf3ffec47632106199cb7f496ce01 127.0.0.1:30005@31005 slave 67ed2db8d677e59ec4a4cefb06858cf2a1a89fa1 0 1426238316232 5 connected
    824fe116063bc5fcf9f4ffd895bc17aee7731ac3 127.0.0.1:30006@31006 slave 292f8b365bb7edb5e285caf0b7e6ddc7265d2f4f 0 1426238317741 6 connected
    e7d1eecce10fd6bb5eb35b9f99a514335d9ba9ca 127.0.0.1:30001@31001 myself,master - 0 0 1 connected 0-5460

    Since Geode members host both primary and secondary buckets, each member can be designated as a primary, (specifically as 'master' in terms of the above output), and no members will be designated as 'slave'.

  • CLUSTER SLOTS - this command produces structured array output with slot information similar to CLUSTER NODES:

    1) 1) (integer) 0
       2) (integer) 5460
       3) 1) "127.0.0.1"
          2) (integer) 30001
       4) 1) "127.0.0.1"
          2) (integer) 30006
    2) 1) (integer) 5461
       2) (integer) 10922
       3) 1) "127.0.0.1"
          2) (integer) 30002
       4) 1) "127.0.0.1"
          2) (integer) 30004
    3) 1) (integer) 10923
       2) (integer) 16383
       3) 1) "127.0.0.1"
          2) (integer) 30003
       4) 1) "127.0.0.1"
          2) (integer) 30005

If a client makes a request to a server that is not hosting the given key, the server will need to respond with a -MOVED error indicating which server is hosting the key. The current implementation avoids this by using a function call to route the request to the member hosting the key. With this proposal, the need for this layer of indirection will be removed and will improve performance.

Although not currently implemented, this change will greatly ease the ability to develop transactional support using Redis' MULTI/EXEC commands. Redis transactions require all keys, participating in a transaction, to be colocated in the same slot. The current implementation would require a much more complicated locking scheme allowing multiple keys to be locked across multiple members.

Changes and Additions to Public Interfaces

This change will remove the ability for non-cluster aware clients to interact with the data unless only a single member is used. This is a change from the current implementation which allows clients to connect to any server without needing any knowledge of data locality.

Performance Impact

This change will have a significant positive performance impact since requests will be targeted directly at servers hosting the data. Thus requests will not incur the penalty of an additional network hop to reach the data (this is analogous to Geode's 'single-hop' concept).

Backwards Compatibility and Upgrade Path

Applications deployed against the current functionality in Geode 1.14 will need to be updated to be cluster aware. Typically this should only require a different form of client initialization.

Prior Art

N/A

FAQ


Errata


  • No labels