Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state:  Accepted

Discussion thread:  https://lists.apache.org/thread.html/rf09dfeeaf35da5ee98afe559b5a6e955c9f03ade0262727f6b5c4c1e%40%3Cdev.flink.apache.org%3E

...

Table of Contents
excludeStatus

Motivation

As discussed in FLIP-131, Flink will deprecate the DataSet API in favor of DataStream API and Table API. Users should be able to use DataStream API to write jobs that support both bounded and unbounded execution modes. However Flink does not provide a sink API to guarantee the exactly once semantics in both bounded and unbounded scenarios, which blocks the unification. 

...

The document includes three parts: The first part describes the semantics the unified sink API should support. According to the first part the second part proposes a transactional new unified sink API. In the last part we introduce two open questions related to the API.

Semantics

This section tries to figure out the sink developer would be responsible for which part and Flink would be responsible for which part in terms of ensuring consistency. After this we could know what semantics the sink API should have to support the exact once semantics.

...

In summary the sink API should be responsible for producing What to commit & providing How to commit. Flink should be responsible for guaranteeing the exact semantics. Flink could “optimize” When & Where to commit according to the execution mode and these optimizations should be transparent to the sink developer.

...

Sink API


Code Block
languagejava
themeConfluence
firstline0
titleSink
linenumberstrue
/**
 * This interface lets the sink developer to build a simple transactional sink topology pattern, which satisfies the HDFS/S3/Iceberg sink.
 * This sink topology includes one {@link Writer} + one {@link Committer} + one {@link GlobalCommitter}.
 * The {@link Writer} is responsible for producing the committable.
 * The {@link Committer} is responsible for committing a single committable.
 * The {@link GlobalCommitter} is responsible for committing an aggregated committable, which we called the global committable.
 * And the parallelism of the {@link GlobalCommitter} is always 1.
 * Both the {@link Committer} and the {@link GlobalCommitter} are optional.
 * @param <InputT>       The type of the sink's input
 * @param <CommT>        The type of the committable data
 * @param <GlobalCommT>  The type of the aggregated committable
 * @param <WriterStateT> The type of the writer's state
 */
interface TransactionalSink<InputTSink<InputT, CommT, GlobalCommT, WriterStateT> {

	/**
	 * Create a {@link Writer}.
	 * @param context the runtime context
	 * @param states the previous writers' state
	 * @return A sink writer
	 */
	Writer<InputT, CommT, WriterStateT> createWriter(InitContext context, List<WriterStateT> states);

	/**
	 * @return a {@link Committer}
	 */
	Optional<Committer<CommT>> createCommitter();

	/**
	 * @return a {@link GlobalCommitter}
	 */
	Optional<GlobalCommitter<CommT, GlobalCommT>> createGlobalCommitter();

	Optional<SimpleVersionedSerializer<CommT>> getCommittableSerializer();

	Optional<SimpleVersionedSerializer<GlobalCommT>> getGlobalCommittableSerializer();

	Optional<SimpleVersionedSerializer<WriterStateT>> getWriterStateSerializer();

	interface InitContext {

		int getSubtaskId();

		MetricGroup metricGroup();
	}
}

...

Code Block
languagejava
themeConfluence
firstline0
titleWriter
linenumberstrue
/**
 * The interface is responsible for writing data and handling any potential tmp area used to write yet un-staged data, e.g. in-progress files.
 * As soon as some data is ready to commit, they (or metadata pointing to where the actual data is staged) are shipped to an operator who knows when to commit them.
 *
 * @param <InputT>       The type of the writer's input
 * @param <CommT>        The type of the committable data
 * @param <WriterStateT> The type of the writer's state
 */
interface Writer<InputT, CommT, WriterStateT> {

	/**
	 * Add an element to the writer.
	 * @param element The input record
	 * @param ctx The additional information about the input record
	 * @param output The committable data could be shipped to the committer by this
	 */
	void write(InputT element, Context ctx, WriterOutput<CommT> output);

	/**
	 * Prepare for a commit.
	 * @param flush  whether flushing the un-staged committable or not
	 * @param output The committable data could be shipped to the committer by this
	 */
	void prepareCommit(boolean flush, WriterOutput<CommT> output);

	/
	List<CommT> prepareCommit(boolean flush);

	/**
	 * @return the writer's state.
	 */
	List<WriterStateT> snapshotState();

	interface Context {

		long currentProcessingTime();

		long currentWatermark();

		Long timestamp();
	}
}

...

Code Block
languagejava
themeConfluence
titleGlobalCommitter
/**
 * The {@link GlobalCommitter} is responsible for committing an aggregated committable, which we called global committables.
 *
 * @param <CommT>   The type of the committable data
 * @param <GlobalCommT>  The type of the aggregated committable
 */
interface GlobalCommitter<CommT, GlobalCommT> {

	/**
	 * This method is called when restoring from a failover.
	 * @param globalCommittables the global committable that are not committed in the previous session.
	 * @return the global committables that should be committed again in the current session.
	 */
	List<GlobalCommT> filterRecoveredCommittables(List<GlobalCommT> globalCommittables);

	/**
	 * Compute an aggregated committable from a collection of committables.
	 * @param committables a collection of committables that are needed to combine
	 * @return an aggregated committable
	 */
	GlobalCommT combine(List<CommT> committables);

	CommitResult commit(GlobalCommT globalCommittable);

	/**
	 * There is no committable any more.
	 */
	void endOfInput();

	enum CommitResult {
		SUCCESS, FAILURE, RETRY
	}
}

Code Block
languagejava
themeConfluence
firstline0
titleWriterOutput
linenumberstrue
/**
* The {@link Writer} uses this interface to send the committable data to the operator who knows when to commit to the external system.
*
* @param <CommT> The type of the committable data.
*/
public interface WriterOutput<CommT> {

	/**
	* Send the committable data 
to the operator who knows when to commit.
	*
	* @param committable The data that is ready for committing.
	*/
	void sendToCommit(CommT committable) throws IOException;
}


Open Questions

There are still two open questions related to the unified sink API

How does the sink API support to write to the Hive?  

In general HiveSink needs three steps before committing the data to the Hive:

...

From the discussion we will not support the HiveSink in the first version.

Is the sink an operator or a topology?

The scenario could be more complicated. For example, some users want to merge the files in one bucket before committing to the HMS. Where to place this logic? Do we want to put all these logic into one operator or a sink topology? 

From the discussion in the long run we should give the sink developer the ability of building “arbitrary” topologies. But for Flink-1.12 we should be more focused on only satisfying the S3/HDFS/Iceberg sink.

Compatibility, Deprecation, and Migration Plan

  • We does not change the current streaming and batch style sink API. So there is no compatibility issue to the existing sink implementation.
  • In the long run we might need to deprecate the old streaming and batch style sink API. 
  • At first we plan to migrate the StreamingFileSink to this new api.

Rejected Alternatives

The difference between the rejected version and accepted version is how to expose the state to the user. The accepted version could give the framework a greater opportunity to optimize state handling.

...