Geode Compatibility with Redis data sharding and cluster changes

To be Reviewed By:

Authors: Jens Deppe

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem

In its initial implementation, the Geode compatibility with Redis API provides a hybrid clustering mode whereby clients can connect to any node and interact as with a standalone Redis server. This, however, does not provide true clustering as it does not expose the functionality for existing cluster-capable clients to make use of data sharding and high availability. This proposal aims to provide that support. In addition, the current approach is not very performant as any given key access may require multiple network hops.

Anti-Goals

We will not provide the ability to operate in either a pure cluster mode or hybrid mode. Cluster mode will be the only mode.

Solution

Redis enables data sharding by partitioning data into 16384 slots. Keys are hashed using a well defined hashing algorithm (CRC16/XMODEM) the result of which is then modded with 16384 to determine the slot. Each primary server is responsible for hosting a non-overlapping set of slots. Various cluster commands provide information on slot-to-server allocation. Thus, given a key, a client is able to determine which server is hosting that data and direct the command to the correct server. See also Redis Cluster Specification.

Geode already has a data sharding concept in the form of buckets with the ability to map keys to buckets using a custom PartitionResolver . In order to map Redis' concept of slots to Geode's buckets we can perform the following:

// Calculate the CRC16 of a given key using the CRC16/XMODEM hashing algorithm (this produces a short unsigned integer)
crc16Hash = CRC16(key)

// A custom PartitionResolver can then compute the correct bucket with:
slot = crc16Hash % 16384
bucket = int(slot / (16384 / BUCKETS_IN_REGION))

Ideally the number of buckets in the region would be a power of 2 (or factor of 16384) so that data is allocated equally. The current default bucket size of 113 would cause a slight imbalance of data so we should use 128 as the default instead.

In addition, at least the following cluster-specifc commands will be implemented in order to enable clients to take advantage of clustering.

  • CLUSTER NODES - this command produces a list identifying each member of the cluster, indicating whether it is a primary or replica and listing the slots which it hosts. Since Geode members host both primary and replica buckets, each member can only be designated as a primary. Below is an example of the proposed output using the literals required by the Redis API.

    07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1:30004@31004 master - 0 1426238317239 4 connected 7500-10922
    67ed2db8d677e59ec4a4cefb06858cf2a1a89fa1 127.0.0.1:30002@31002 master - 0 1426238316232 2 connected 5461-7499
    292f8b365bb7edb5e285caf0b7e6ddc7265d2f4f 127.0.0.1:30003@31003 master - 0 1426238318243 3 connected 10923-12999
    6ec23923021cf3ffec47632106199cb7f496ce01 127.0.0.1:30005@31005 master - 0 1426238316232 5 connected 13000-16383
    824fe116063bc5fcf9f4ffd895bc17aee7731ac3 127.0.0.1:30006@31006 master - 0 1426238317741 6 connected 2500-5460
    e7d1eecce10fd6bb5eb35b9f99a514335d9ba9ca 127.0.0.1:30001@31001 myself,master - 0 0 1 connected 0-2499
  • CLUSTER SLOTS - this command produces structured array output with slot information similar to CLUSTER NODES, including only the primary nodes. 

    1) 1) (integer) 0
       2) (integer) 2499
       3) 1) "127.0.0.1"
          2) (integer) 30001
    2) 1) (integer) 2500
       2) (integer) 5460
       3) 1) "127.0.0.1"
          2) (integer) 30006
    3) 1) (integer) 5461
       2) (integer) 7499
       3) 1) "127.0.0.1"
          2) (integer) 30002
    4) 1) (integer) 7500
       2) (integer) 10922
       3) 1) "127.0.0.1"
          2) (integer) 30004
    5) 1) (integer) 10923
       2) (integer) 12999
       3) 1) "127.0.0.1"
          2) (integer) 30003
    6) 1) (integer) 13000
       2) (integer) 16383
       3) 1) "127.0.0.1"
          2) (integer) 30005
    
    
  • -MOVEDIf a client makes a request to a server that is not hosting the given key, the server will need to respond with a -MOVED error indicating which server is hosting the key. The current implementation avoids this by using a function call to route the request to the member hosting the key. With this proposal, the need for this layer of indirection will be removed and will improve performance.

Although not currently implemented, this change will greatly ease the ability to develop transactional support using Redis' MULTI/EXEC commands. Redis transactions require all keys, participating in a transaction, to be colocated in the same slot. The current implementation would require a much more complicated locking scheme allowing multiple keys to be locked across multiple members.

Additional CLUSTER commands may also be implemented in the future - for example CLUSTER REPLICAS and CLUSTER INFO .

Changes and Additions to Public Interfaces

This change will remove the ability for non-cluster aware clients to interact with the data unless only a single member is used. This is a change from the current implementation which allows clients to connect to any server without needing any knowledge of data locality.

Performance Impact

This change will have a significant positive performance impact since requests will be targeted directly at servers hosting the data. Thus requests will not incur the penalty of an additional network hop to reach the data (this is analogous to Geode's 'single-hop' concept).

Backwards Compatibility and Upgrade Path

Applications deployed against the current functionality in Geode 1.14 will need to be updated to be cluster aware. Typically this should only require a different form of client initialization.

Prior Art

N/A

FAQ


Errata


  • No labels

6 Comments

  1. I don't understand this line: "(This would not preclude non-cluster clients capable of interpreting MOVED responses from working)."
    and this line later:
    "This change will remove the ability for non-cluster aware clients to interact with the data unless only a single member is used."
    This line seems to be related to the above, but I'm not sure I understand the full impact.
    Also, do we have a sense of how many users/use cases this might impact, i.e. how many Redis clients run as single node Redis apps?

  2. Thanks for the comment Diane. I've removed the 'MOVED' sentence to avoid confusion. I'm not sure what the answer to your second question is - Pulkit or John Martin might be better to answer that.

  3. I am not an redis expert and haven't read through any redis reference provided here...i may be asking basic/dumb question is here...

    Based on the RFC it looks like a user/application can seek data in redis based on its "slot" and primary/secondary data info; is that true? If not how is this matters; the geode side code could piggy back on the partition algorithms Geode is supporting...Is this because redis plugin in Geode is not (embedded) a true Geode client that can not take advantage of Single-hop mechanism; where meta-data about the PR is stored/available.

    1. Redis defines its own single hop partitioning mechanism that we must conform to. This RFC is how we intend to to that. The concepts are similar. Slots are like our buckets but rather than have 16K buckets we will spread the 16K slots evenly across 128 buckets. Since our nodes act as primary and replica nodes we will only expose them as primary nodes to Redis clients. There are few reasons for this, Redis clients rarely use replica nodes directly because of consistency lag and the meta data protocols, of which there are two, don't support the concept of mixed primary/replica state on the same node. Similar to Geode's PR metadata, Redis has CLUSTER INFO and CLUSTER SLOTS for getting metadata about what nodes hosts what slots. The clients are responsible for hashing the keys, figuring out the slot, finding the primary for the slot and sending the command to the primary. Unlike Geode, if the node is not the primary it will send a MOVED response and the client will retry the operation on the given node and update its metadata for that slot.

  4. >> Redis defines its own single hop partitioning mechanism that we must conform to. This RFC is how we intend to to that.

    >>Redis has CLUSTER INFO and CLUSTER SLOTS for getting metadata 

    These are all seems internal; not user or application exposed; Is our Redis implementation is based/uses these internal information. Do we need to be conformant to Redis storage mechanism/protocols as Geode has its own way of storing data...Again, pardon my questions, I am not expert on Redis and its usage in Geode.


    1. Redis clients are expected to use the info from these CLUSTER  commands to perform optimized access to the data. For example a client will call CLUSTER SLOTS  and figure out which servers host specific slots. Then, for a given key, the client can use the fixed hashing algorithm to determine which slot a key maps to and thus which server to send the request to.