You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Reliability Requirements

Fail-over (session state)

A cluster member informs its clients of backup candidates for each session. It can update the list periodically.

After an unexpected disconnect the client can connect to one of the candidates and resume its session transparently. All session state is preserved including:

  • Open references
  • Active consumers
  • Commands-in-flight
  • Open transactions (question: Is there any value in fail-over that aborts TX and/or DTX transactions?)

Sessions do not survive

  • multiple failures that include the current node and all back-up nodes for that session.
  • shutdown/restart of the cluster.

Cluster Restart (durable resources)

The AMQP entities that survive a restart are those defined by AMQP to survive broker restart. AMQP defines durable exchanges and queues and persistent messages. Some further definitions:

  • durable message: persistent messages on a durable queues.
  • durable enque: act of enqueuing a persistent message on a durable queue.
  • durable binding: binding between durable exchange and durable queue.

The following are preserved if the entire cluster shuts down/crashes and is re-started:

  • Durable wiring: durable exchanges, queues and bindings.
  • Durable messages
  • Prepared DTX transactions

The following do not survive a restart:

  • Session state
  • Non-durable wiring
  • TX transactions are aborted.
  • Unprepared DTX transactions are aborted.
  • Non-durable effects of prepared DTX transactions are lost.

Restarting DTX Transactions

What happens if a DTX:

  • enqueues non-persistent messages.
  • enqueues to non-durable queues.
  • dequeues from non-durable queues.
  • dequeues non-persistent messages.

Consistent semantics: outcome is equivalent to transaction comitting before restart.

  • Non-durable enqueues are dropped without error.
  • Durable enqueues take effect.
    This is equivalent to the transaction comitting before restart, and non-durable messages
    being lost in restart.

TODO: What about dequeue? Is AMQP "get" transactional? If so how can we provide consistent semantics without making every queue and message durable? Should we abort transactions that include a get() on restart?

  • No labels