...
Motivation
Many use-cases build on observation and processing changed records.
...
- Independence from the server node process (JVM) - issues and failures of the consumer shouldn't lead to server node instability.
- Notification guarantees and failover - i.e. track and save a pointer to the last consumed record. Continue notification from this pointer in case of restart.
- Resilience for the consumer - it's not an issue when a consumer temporarily consumes slower than data appear.
Description:
draw.io Diagram |
---|
border | true |
---|
| |
---|
diagramName | CDC |
---|
simpleViewer | false |
---|
width | 400 |
---|
links | auto |
---|
tbstyle | top |
---|
lbox | true |
---|
diagramWidth | 391 |
---|
revision | 3 |
---|
|
...
On the segment archiving, utility iterates it using the existing WALIterator and notifications CDCConsumer of each record from the segment.
Design choices:
- CDC application works as a separate process.
- CDC relies on the existing Ignite mechanism - WAL.
- IEP Scope - deliver local data change events to a local consumer.
- CDC keeps consumer offset in a special file.
WAL process will start from this offset on restart. - To prevent interference between the WAL archive process and CDC Ignite will create a hard link to the newly created segment in a special folder.
After success processing, CDC will delete this link.
Note, data will be removed from the disk only after CDC and Ignite will remove the link to a segment from both corresponding folders. - To manage minimal event gap new configuration timeout introduced - WalForceArchiveTimeout.
- Flag to distinguish DataEntry on primary and backup nodes introduced.
- All public APIs market with @IgniteExperimental to be able to improve it based on real-world usage feedback.
- CDC consumer will be notified about binary metadata changes (Phase 2).
- Configuration parameter "Maximum CDC folder size" will be implemented to prevent disk volume exceed.
- CDC folder resolved using the logic as Ignite node does.
- CDC application should be restarted by the OS mechanism in case of any error (destination unavailability, for example)
- Initially, single CDC consumer supported. Support of several concurrently running consumers will be implemented in Phase2.
...
- CDC utility will be started and automatically restarted in the case of failure by the OS or some external tools to provide stable change event processing.
- CDC feature may be used for the deployment that has WAL only.
- At the start of the CDC first consumed event will be the first event available in the WAL archive.
- The lag between the record change and CDC consumer notification will depend on segment archiving timeout and requires additional configuration from the user.
- CDC failover depends on the WAL archive segment count. If the CDC application will be down a relatively long time it possible that Ignite deletes certain archive segments,
therefore consumer can't continue to receive changed records and must restart from the existing segments.
Discussion Links
http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-IEP-59-CDC-Capture-Data-Change-tc49677.html
Reference Links
https://dev.mysql.com/doc/refman/8.0/en/mysqlbinlog.html
...
https://docs.microsoft.com/ru-ru/sql/relational-databases/track-changes/track-data-changes-sql-server?view=sql-server-ver15
Tickets
Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
maximumIssues | 20 |
---|
jqlQuery | labels = IEP-59 ORDER BY status ASC |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
|