You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Overview

 

The WAN gateway allows 2 distinct data centers, or 2 availability zones, to maintain data consistency. In the case where one data center cannot process incoming events for any reason, the other data center should retain events so that no data is lost. Currently if one site can connect to another and send it events, those events are removed from the queue when the ack is received, regardless of what happens to them on the remote site. This behavior is controlled by the system property REMOVE_FROM_QUEUE_ON_EXCEPTION which defaults to true. It is unacceptable to simply store events that did not get successfully processed on the receiving end somewhere without replaying them. Customer needs to only know what events did not get transmitted. Most common exceptions thrown from a receiving site include:

  • Low Memory Exception
  • Malformed data exception (unable to deserialize)


Goals:

  1. Deprecate (and later remove) the internal system property REMOVE_FROM_QUEUE_ON_EXCEPTION, but detect if it is set to false and support existing behavior (infinite retries)
  2. Create a new callback API that will be executed when an exception returned with the acknowledgement from the receiver
  3. Provide an example implementation of the callback that saves events with exceptions returned from the receiver in a 'dead-letter' queue on the sender (on disk)
  4. Add 2 new properties for the gateway receiver to control when to send the acknowledgement with the exceptions:
    1. the number of retries for events failing with an exception
    2. the wait time between retries 

Not in Scope

  1. Providing the ability to directly replay events from the dead-letter queue.

Approach

 Our current design approach is as follows:

  1.  Deprecate existing internal boolean system property: REMOVE_FROM_QUEUE_ON_EXCEPTION
    • Continue to support default behavior if boolean set to false by setting # retries on receiver to -1

API Change



Risks and Unknowns

  1.  

  • No labels