Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Warning

The wiki pages are not used for documentation any more. Please visit http://bookkeeper.apache.org for latest documentation.

 

BookKeeper auto recovery discussed in BOOKKEEPER-237 JIRA and already implemented many sub-tasks in it.
We have to discuss about Fsck feature. Edit this page

...

When any Bookie goes down in the BookKeeper cluster, there is no way to recover the lost data from that Bookie server. For example, if we have 2 replicas for a ledger in BK cluster, and a node goes down from it, we will be running the cluster with single replica. Running clusters with single or no replicas will be a risk, as nodes may fail in general. To avoid such situations, we need a mechanism for recovering the data to new bookies for meeting the enough replica criteria (quorum size) and it is called as Auto-Recovery in BookKeeper.

...

  •  Auditor
  •  ReplicationWorker

AuditorPeer AutoRecoveryMain is an Auto-recovery node, which internally initializes and starts Auditor and ReplicationWorker threads. So, each Auto-recovery node will have two threads running.

This Auto-recovery node has to be started in each Bookie machine. All recovery nodes will participate in leader election and one Auditor may become as the leader and others will just watch the elected auditor failure to participate again in next election.

Auditor:

Once the Auditor thread is started, the auditor elector will go for the election to win the auditing job for Bookie cluster. Here, auditing job would be that, it has to detect the under-replicated ledgers in the cluster due to Bookie failures.

...