Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

That means, if a checkpoint succeeds but we do not receive the success event, the infight instant will span two/(or more) checkpoints.

Work Flow

The function firstly buffers the data as a batch of {@link HoodieRecord}s, It flushes(write) the records bucket when the bucket size exceeds the configured threshold {@link FlinkOptions#WRITE_BUCKET_SIZE}
or the whole data buffer size exceeds the configured threshold {@link FlinkOptions#WRITE_BUFFER_SIZE}
or a Flink checkpoint starts. After a batch has been written successfully,
the function notifies its operator coordinator {@link StreamWriteOperatorCoordinator} to mark a successful write.

The Semantics

The task implements exactly-once semantics by buffering the data between checkpoints. The operator coordinator
starts a new instant on the time line when a checkpoint triggers, the coordinator checkpoints always
start before its operator, so when this function starts a checkpoint, a REQUESTED instant already exists.

In order to improve the throughput, The function process thread does not block data buffering
after the checkpoint thread starts flushing the existing data buffer. So there is possibility that the next checkpoint
batch was written to current checkpoint. When a checkpoint failure triggers the write rollback, there may be some duplicate records
(e.g. the eager write batch), the semantics is still correct using the UPSERT operation.We may improve it to be exactly-once once the write supports write record one-by-one instead of batches (The write shift the file handles on checkpointing).

Fault Tolerance

...

The operator coordinator checks and commits the last instant then starts a new one when a checkpoint finished successfully.
The operator rolls back the written data and throws to trigger a failover when any error occurs.
This means one Hoodie instant may span one or more checkpoints(some checkpoints notifications may be skipped).
If a checkpoint timed out, the next checkpoint would help to rewrite the left buffer data (clean the buffer in the last
step of at the #flushBuffer method).

The operator coordinator would try several times when committing the write status.

Note: The function task requires the input stream be shuffled by the file IDs.

...

when all the files are checked, mark the index in pure state mode, we can then only check the state for the records that came in later.

Image RemovedImage Added

The new index does not need to bootstrap for existing/history dataset, it is also useful for different Flink Job writes.

...