Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

由于licensing的限制, 我们必须替换内置在 Geode 中老的JGroups 通信堆栈.  老的JGroups 有 LGPL 授权, 与 Apache 2.0 的授权是不兼容的.  此文档解释了对于成员关系系统上 Geode的一些需求和未来的一些替换, 最后围绕以 Geode 为核心完成对 JGroups 的替换.

 

...

Geode

...

怎样使用 JGroups

Geode 的动态成员关系模型基于通过pbcast.GMS 来提供的 JGroups 动态模型, 在新的节点添加进来的地方, 不会发生分布式系统瘫痪宕机的情况.  

Geode使用成员关系视图来进行复制和事件投递. 对于复制, 我们形成一个成员关系视图的子集, 让一个 Cache Region 知道, 同时标记它为 DistributionAdvisor. 我们当前的复制模式需要来自一个特定接收者的'返回接收', 只要这个接收者在成员关系视图中.

成员关系视图也可以被用来在分布式锁服务中选择一个老成员.老陈冠负责追踪谁允许被授权锁请求, 它在视图中是最老的'非管理员'成员.

网络分区检测也构建在JGroups GMS协议之中, 故障检协议已经被替换让Geode发送"怀疑对象"给JGroups, 帮助它描述出故障已经发生.
 

 

Peer 认证已经构建在2.2.9 JGroups 堆栈之上, 通过引入自定义的认证协议, 拦截'加入'请求', 同时要求认证在允许请求到达GMS 成员关系协议之前来检查发送

...

Geode’s dynamic membership model is based on the JGroups dynamic model provided by pbcast.GMS, where new nodes can be added at will without taking the distributed system down.  

Geode uses the membership view primarily for replication and event delivery. For replication we form a subset of the membership view that is known to have a cache Region and label this a DistributionAdvisor. Our current replication scheme requires a return-receipt from any specified recipient as long as the recipient is in the membership view.

The membership view is also used for selecting an Elder for the distributed lock service. The Elder keeps track of who is allowed to grant lock requests, and it is the oldest non-admin member in the view.

Network partition detection is also custom built into the JGroups GMS protocol and the failure-detection protocols have been altered to let Geode feed “suspicion” into JGroups to help it figure out that failures have happened.

Peer authentication has been built into the 2.2.9 JGroups stack by introducing a custom AUTH protocol that intercepts Join requests and requires authentication checks to pass before allowing the request to reach the GMS membership protocol.

In the future Geode servers were also going to rely on JGroups for reliable UDP transmission of messages that are broadcast to the whole membership set, such as StartupMessage, ShutdownMessage, CreateRegionMessage and PDX registrations.  Sending these messages over TCP/IP stream connections is a barrier to increasing the size of the distributed system, especially at startup time when we must create 4M of these connections (M=member count) just to join the distributed system.

未来 Geode 服务器也会响应JGroups, 对于可靠的 UDP 消息传输, 这个 UDP 消息用来广播整个成员关系集合, 例如StartupMessage, ShutdownMessage, CreateRegionMessage 和 PDX 注册.

跨 TCP/IP 流连接发送这些消息有一个障碍是增加分布式系统的成员数量.

Reliable UDP communication is also needed for out-of-band low-priority communication, such as sending alerts to management nodes.  Creating TCP/IP connections to send alerts can block operations during periods when there are already bad things happening.  We recently saw this in a large production system, where an alert that members weren’t acknowledging a membership view change blocked operations because the management node that was to receive the alert was sick and not accepting connections.

...