You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Status

This is still a work in progress.

Overview

Flume's master design is one of evolution. Over time, a number of deficiencies have become clear.

  • Not all features are supported in multi-master mode.
  • Not all state in available in ZooKeeper (which leads to the above problem).
  • Clients are statically aware of the list of masters. This introduces potential delays in locating a live master if one or more have died.

The Flume committers have decided to address these issues by a redesign / implementation of the Flume master.

Note that the E2E mechanics - notably the reliance on the master for ACK delivery - is being broken out into a separate discussion and is out of scoop for this document. That said, it may be worth spending time to at least discuss whether or not we want to deal with E2E mode in the context of a new master. It may be more work to rearchitect the master while preserving the current ACK mechanism than to simply address ACK delivery concurrently. This is something that should be discussed prior to beginning development as to have a clear path through implementation.

The primary JIRA tracking this work is https://issues.cloudera.org/browse/FLUME-617

High Level Options

There are a few possible approaches with varying degrees of effort and functionality.

  1. Allow all masters to remain active, pushing all state into ZK so it's shared between them. Clients retain the list of all possible masters and pick one at random to connect to. Deal with E2E ACKs by pushing them into ZK.
  2. Have masters go through an election process, push all state into ZK so it's shared between them in the case of failure. Clients no longer contain the list of masters and instead contain the ZK quorum node list. The current master is fetched from ZK. Deal with E2E ACKs by pushing them into ZK so in the case of master failover, no ACKs are lost.
  3. Have masters go through an election process, push all state into ZK so it's shared between them in the case of failure. Clients no longer contain the list of masters and instead contain the ZK quorum node list. The current master is fetched from ZK. Deal with E2E ACKs by simply letting them expire and be retransmitted in the case of master failover.
  • No labels