IDIEP-104
Author
Sponsor
Created 26/05/2023
Status

ACTIVE


Motivation

IEP-59 Change Data Capture defines CDC that runs in near realtime. The background process ignite-cdc awaits WAL segments to be archived for data capturing. The awaiting leads to the lag between the moment event happens and consumer is notified about it. This lag can be relatively big (1s-10s seconds). It's proposed to provide opportunity to capture data and notify consumers directly from the Ignite node process. It will minimize the lag by cost of additional memory usage.

Description

FileWriteAheadLogManager logs record into mmap files, each is represented as a byte buffer FileWriteHandleImpl#SegmentedRingByteBuffer. The buffer designed for multiple writers, single reader.

The reader is a thread that is responsible for fsync'ing the file content on a disk. This role are performed by the following threads: wal-segment-syncer , db-checkpoint-thread  or user thread in case rollover WAL segment.

It's guaranteed that the reader reads the buffer sequentially from first byte until the buffer full. Then it's safe to notify CDC about new events from the reader.

Performance suggestions:

  1. The reader reads the buffer regularly, the default period is 500ms. The period can be configured with IGNITE_WAL_SEGMENT_SYNC_TIMEOUT.
  2. Single thread for preparing events, the reader buffer is already ordered, no need to spend additional resources for sorting events.
  3. Handling events as ByteBuffer representations is memory efficient: no additional heap usage is required.
  4. Iterating and filtering within such a buffer is fast, as it's required to read only few bytes for type and offset.

Restoring state after a node restart

During start node performs memory restore based on WAL - restore physical state and replay logical updates. Here CDC should collect events from WAL since the CdcConsumerState#walState until the restored pointer.

The restoring the state should be performed before any new events happened.

User paths

Enable realtime CDC on cluster:

  1. Configure CDC in Ignite (cdcEnabled=true, provide implementation of CdcManager)
  2. Start Ignite node
  3. Start background process ignite-cdc (it starts by default in the PASSIVE mode)

Ignite node restart after failure:

  1. Start Ignite node as usual (Ignite should automatically recover the CDC state)

User interface

Ignite

  1. CdcManager interface that provides 

CdcWorker

WAL records
RealtimeCdcRecord extends WALRecord {
	private WALPointer last;
}

StopRealtimeCdcRecord extends WALRecord {
	private WALPointer last;
}

ignite-cdc in PASSIVE mode

  1. Parses WAL records, looking for RealtimeCdcRecord and StopRealtimeCdcRecord
  2. For RealtimeCdcRecord - clears obsolete links from CDC directory
  3. For StopRealtimeCdcRecord - switch to ACTIVE mode, start capturing from the last WALPointer (from previous RealtimeCdcRecord).

ignite-cdc in ACTIVE mode

  1. Capturing WAL records
  2. Looking for TryStartRealtimeCdcRecord - after reaching it, persist CdcConsumerState locally, switch to PASSIVE mode.


Risks and Assumptions

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

// Links to various reference documents, if applicable.

Tickets

type key summary assignee reporter priority status resolution created updated due

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels