Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Peer 认证已经构建在2.2.9 JGroups 堆栈之上, 通过引入自定义的认证协议, 拦截'加入'请求', 同时要求认证在允许请求到达GMS 成员关系协议之前来检查发送.

In the future Geode servers were also going to rely on JGroups for reliable UDP transmission of messages that are broadcast to the whole membership set, such as StartupMessage, ShutdownMessage, CreateRegionMessage and PDX registrations.  Sending these messages over TCP/IP stream connections is a barrier to increasing the size of the distributed system, especially at startup time when we must create 4M of these connections (M=member count) just to join the distributed system.

未来 Geode 服务器也会响应JGroups, 对于可靠的 UDP 消息传输, 这个 UDP 消息用来广播整个成员关系集合, 例如StartupMessage例如StartupMessage, ShutdownMessage, CreateRegionMessage 和 PDX 注册.

跨 TCP/IP 流连接发送这些消息有一个障碍是增加分布式系统的成员数量.特别是在启动时, 我们必须创建4个成员的连接来加入到分布式系统中.

Reliable UDP communication is also needed for out-of-band low-priority communication, such as sending alerts to management nodes.  Creating TCP/IP connections to send alerts can block operations during periods when there are already bad things happening.  We recently saw this in a large production system, where an alert that members weren’t acknowledging a membership view change blocked operations because the management node that was to receive the alert was sick and not accepting connections.

Geode integrates a JGroups GossipServer into the Locator service.  GossipServer is used to provide information on who is in the distributed system when a new member is joining the distributed system.

可靠的 UDP 通信在带外的, 低优先级通信的环境下也需要, 如管理节点的告警. 创建 TCP/IP  连接发送告警能够阻塞故障或错误的操作发生.我们最近也看到了一些大型的生产系统中, 当一个告警阻塞了操作 — 成员还没有确认一个成员关系视图的变化, 因为管理节点接收到的告警是有问题的, 不接受连接.

Geode 集成了一个 JGroups GossipServer 到  Locator 服务当中.  GossipServer 用于提供一些信息, 谁在分布式系统中, 当一个新成员正加入到分布式系统中时.

最后, Geode 客户端使用成员关系系统的类来形成 IDs, 这些包含了一个 Finally, Geode clients use the membership system's classes to form IDs, and these contain a JGroups IpAddress.

 

...

需求

In brief, the membership service must

简要来讲, 成员关系服务必须

  1. 投递成员变化的通告消息到 分布式系统的成员关系管理器

  2. 允许新的成员关系加入, 而无系统停机的情况

  3. 在分布式系统里为每个成员提供一个身份, 同时允许客户端有类似的身份.  对于 Peer 来说, 此 ID 必须是唯一的, 老的身份不应该被重用 (至少不是非常快)

  4. 在成员 ID中, 传输信息包括每个成员的 DistributedSystem 特征

  5. deliver notification of membership changes to the DistributedSystem’s MembershipManager

  6. allow new members to join without taking down the system

  7. provide identity for each peer in the distributed system and allow clients to have a similar identity.  The identity must be unique for the peer and old identities should not be reused (at least not very quickly)

  8. transmit information about each member’s DistributedSystem characteristics (VM type, DirectChannel port, Groups, Name, etc) in the member’s ID

  9. efficiently and quickly detect loss of a member (failure detection)

  10. support the notion of an Elder member for Geode’s Distributed Lock Service

  11. support Geode’s model of handling network partitions (winning/losing partitions)

  12. allow Geode to give advice on which members might be sick or out of action

  13. support rolling upgrade (old members can’t rejoin once upgrade has begun & the service itself must support backward compatibility)

  14. integrate with Geode’s authentication service and require authentication before allowing a new member to join

A UDP messaging services must

  1. Be compatible with the membership service’s IDs  (an ID from membership identifies endpoings in the UDP messaging service)

  2. Support rolling upgrade (on-wire compatibility across releases)

...

  1.  

  2. 高效, 快速地检测到一个成员丢失情况 (故障检测)

  3. 对于分布式锁服务, 支持老成员的想法

  4. 支持 Geode 模型来处理网络分区 (获得/丢失分区)

  5. 允许 Geode 给出通知, 哪个成员可能存在问题

  6. 支持滚动升级 (一旦升级开始, 老的成员不能重新加入 & 服务自身支持向后兼容)

  7. Geode 认证服务集成, 在允许新的成员加入前进行认证

 UDP 消息服务必须

  1. 兼容成员关系服务 IDs  (, 成员关系的一个 ID 标识了 UDP 消息服务中的端点)

  2. 支持滚动升级 (跨发布版本的on-wire 兼容性)



替换 JGroups v2.2.9

...

Move to a newer version of JGroups

Use Zookeeper

Use Akka

Create a custom solution

...

的选项


对于我们来说有一些选项可以选择.  选项如下:

迁移到新的JGroups版本

使用 Zookeeper

使用 Akka

创建一个自定义解决方案


Geode做的最好事情之一是成员管理管理是动态的, 毫无单点故障.  稍微弱的地方是 Locator 服务, 如果所有的 Locators 都有一个急速下降的客户端数, 不能获得服务器的相关信息, 且新的服务器不能添加到集群中, 直到一个Locator再次恢复可用性.  即时 Locators 都宕掉了, 服务器集群仍然是可用的, 客户端仍然能够连接到服务器上.

对于这种情况, 如 Zookeepr 看起来是不满足要求的. 用户必须配置Zookeeper集群, 保证集群是配置好的, 把服务器丢失连接的风险降低到最小. 丢失一个集群的连接将需要一个服务器停机. Zookeeper集群通常情况下比较小, 因此一个 带有200台服务器的Geode 用户将会感觉存在风险, 当使用 一个大的, 7个节点的Zookeeper 集群时.Zookeeper并不响应7~11个节点的需求, 或者提供 UDP 消息通信.

 

JGroups 已经牵扯和解决了大量的必须在Geode 中 2.2.9 版本上修复的问题. 然而, 为了使用它, 我们 Fork 了一个版本, 修改特定的部分为了满足4, 7, 9 和 10版本.我们 Fork 出特定的部分, 如GMS 和故障检测协议 , 但是视图(View)类需要传递认证的 Credentials.  如果你没有在集群成员关系上使用它, JGroups 仍然在可靠的 UDP 消息传输上是有用的.

Akka 看起来是有前途的, 现在有大量的人使用它.

没有利用其他工程的自定义解决方案也能实现这个功能, 特别是如果 JGroups 被用于可靠地  UDP 消息传输时.

上述选项检查

本章节我们看一下每个选项, 看一下每个选项如何定位这些需求

...

For this reason a solution like Zookeeper seems inadequate.  Users would have to configure Zookeeper clusters and make sure the cluster is configured so that servers have minimal risk of losing contact with it.  Losing contact with the cluster would require a server to shut down.  Zookeeper clusters are typically pretty small, so a Geode user with 200 servers might feel at risk when using even a large 7-node Zookeeper cluster.  Zookeeper also doesn't answer requirements 7 through 11 or offer UDP messaging.

JGroups has evolved and solved a lot of problems that had to be fixed in the 2.2.9 copy currently in the Geode repository.  However, in order to use it we would have to fork it and modify certain parts in order to answer requirements 4, 7, 9 and 10.  We could fork only parts, such as GMS and the failure detection protocols but the View class needs to carry Credentials for authentication in order to be useful to Geode.  If not used for cluster membership, JGroups might still be useful for reliable UDP messaging.

Akka looks promising and a lot of people are using it.  

A custom solution that does not leverage other projects for clustering could also be implemented, especially if JGroups is used for reliable UDP messaging.

Examination of the options

In this section we'll look at each of the options and see how it might address the requirements.

JGroups

A newer version of JGroups could be modified to fit the requirements, just as is being done with the initial, incubating, version of Geode.

...

So, to sum up, zookeeper could be used as the basis for a membership management service, replacing some of the functionality we currently have built into JGroups.  We would have to implement a fair portion of what we need outside of zookeeper, and using zookeeper comes with some risks.  We'd still need a different solution for UDP messaging.

Akka

...

集群

Akka is used by Google 被用于Google Compute Engine and other projects for clustering.  Google has posted that it achieved 1500 nodes in a cluster with stable performance using a simple application and fairly loose timeouts.和其他的集群工程中.  Google 发布博客宣称在一个集群中它能够达到1500个节点, 且性能稳定, 同时超时范围宽松.

Akka 集群的相关文档, 如下所示Akka clustering documentation can be found herehttp://doc.akka.io/docs/akka/snapshot/common/cluster.html

Akka uses configured seed-nodes to join the cluster, which is compatible with Geode’s Locator discovery pattern, but they have not solved the concurrent-startup problem for seed nodes that Geode has licked in its JGroups improvements.  The first seed node needs to be brought up before any other seed nodes are started使用配置的种子节点加入到集群中, 它兼容 Geode的 Locator 发现模式, 但是对于种子节点, 它也不能够解决并发启动的问题,  Geode 尝试了JGroups 增强实现.  第一个种子节点需要提出, 在其他的种子节点启动之前.

Akka delivers membership change notifications using an Actor model for individual MemberUp/MemberRemoved events and you can also get the cluster state.  Cluster state is fairly complete and has both a lead-member, a sorted set of members and a set of unreachable members.  One thing missing from cluster state is any form of unique identifier.

...