...
- CDC utility will be started and automatically restarted in the case of failure by the OS or some external tools to provide stable change event processing.
- CDC feature may be used for the deployment that has WAL only.
- At the start of the CDC first consumed event will be the first event available in the WAL archive.
- The lag between the record change and CDC consumer notification will depend on segment archiving timeout and requires additional configuration from the user.
- CDC failover depends on the WAL archive segment count. If the CDC application will be down a relatively long time it possible that Ignite deletes certain archive segments,
therefore consumer can't continue to receive changed records and must restart from the existing segments.
Online (real-time) CDC
In case of using separate process for capturing data changes from WAL archives makes the lag between CDC event happens and consumer notified about it is relatively big. It's proposed to provide opportunity to capture data and notify consumers directly from Ignite process. It helps minimize the lag by cost of additional memory usage.
Issues to solve:
- Behavior after the CDC buffer is full. Options are:
- Stop online CDC and delegate capturing to the ignite-cdc.sh process (CdcMain, based on WAL archives)
- Temporary switch to the CdcMain, and switch back to online CDC after closing the gap.
- From which point online CDC starts capturing:
- Check local persisted OnlineCdcConsumerState - find last captured WALPointer.
- What if the pointer is less than any pointer recovered during Ignite node startup?
- What if ignite-cdc.sh streamed before node stop?
User interface:
IgniteConfiguration#cdcConsumer
- implementation of the CdcConsumer interface.IgniteConfiguration#cdcBufSize
- size of the buffer used by CDC to store captured changes. Default is (walSegCount
* walSegSize
), for the default values it is 640MB.- Logs:
- Initialization info.
- Switch between working modes.
- metrics:
- Ordinary CDC metrics (count of captured WAL segments and entries).
- Current working mode.
- Used buffer space.
- Lag between buffer and WAL archive (segments).
- Lag between writing to WAL and capturing by CDC (milliseconds).
- Last captured
WALPointer.
Segments:
Note, there is a confusion of using “segment” word:
- WAL segments are represented as numerated files. Size of WAL segments is configured with
DataStorageConfiguration#walSegmentSize
. ReadSegment
is a slice of the mmap WAL segment. It contains WAL records to sync with the actual file. Size of the segment differs from time to time and its maximum can be configured with DataStorageConfiguration#walBuffSize
.
CdcWorker:
CdcWorker is a thread responsible for collecting WAL records, transforming them into cdc events, submitting them to the CdcConsumer
. The worker has 2 modes:
BUFFER_MODE
- consumes WAL records from the CdcBufferQueue
, that is filled directly from the WAL manager.ARCHIVE_MODE
- consumes WAL records from archived WAL segments.- Note, that the
CdcBufferQueue
is being filled in background in this mode.
Initialization:
CdcWorker
initialized with CdcConsumerState#loadWalState
.- Initial mode is
ARCHIVE_MODE
. It switches to the CdcBufferQueue
after: - The loaded pointer is not reached in the archive.
- OR the head of the buffer queue is less than the loaded pointer.
Capturing from the buffer (wal-sync-thread):
- In wal-sync-thread (the only reader of mmap WAL), under the lock that synchronizes preparing
ReadSegment
and rolling the WAL segment, to guarantee there are no changes in the underlying buffer. - Offers a deep copy of flushing
ReadSegments
to the CdcWorker
. CdcWorker
checks remaining capacity and the buffer size.- If the size fits the capacity then store the offered buffer data into the Queue.
- Otherwise:
- Remove from the queue tail segments to free space for the offered buffer.
- Store the head of the offered buffer as nextHead (
WALPointer
). - It captures data from the
Queue
while nextHead is not reached. - Switch to the
ARCHIVE_MODE
.
Body loop (cdc-worker-thread):
BUFFER_MODE
:- Polls the
Queue
, transforms ReadSegment data to Iterator<CdcEvent>
, pushes them to CdcConsumer
. - Optimization: transform segment buffers to CDC events in background (to reduce the buffer usage).
CdcConsumer
should be async then?
ARCHIVE_MODE
:- Similar to
CdcMain
- await archived segments. - Submits the read WAL records to the
CdcConsumer
. - For every segment/record checks a condition to switch to the
bufferMode
:- Check the loaded
WALPointer
after initialization. - OR while nextHead is not reached.
- In both modes it persists
CdcConsumerState
. Policy for committing the progress: by WAL segment.
Discussion Links
http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-IEP-59-CDC-Capture-Data-Change-tc49677.html
...