Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagejava
titleSourceReader reading methods
linenumberstrue
public interface SplitReader<E, SplitT extends SourceSplit> {

	RecordsWithSplitIds<E> fetch() throws InterruptedException;

	void handleSplitsChanges(Queue<SplitsChange<SplitT>> splitsChanges);

	void wakeUp();
}

The RecordsWithSplitIds returned by the SplitReader will be passed to an RecordEmitter one by one. The RecordEmitter is responsible for the following:

  • Convert the raw record type <E> into the eventual record type <T>
  • Provide an event time timestamp for the record that it processes.

Failover

The state of the SplitEnumerator includes the following:

...

Code Block
languagejava
titleRecordWithSplitIds
linenumberstrue
/**
 * An interface for the elements passed from the SplitReader to the source reader.
 */
public interface RecordsWithSplitIds<E> {

	/**
	 * Get all the split ids.
	 *
	 * @return a collection of split ids.
	 */
	Collection<String> getSplitIds();

	/**
	 * Get all the records by Splits;
	 *
	 * @return a mapping from split ids to the records.
	 */
	Map<String, Collection<E>> getRecordsBySplits();

	/**
	 * Get the finished splits.
	 *
	 * @return the finished splits after this RecordsWithSplitIds is returned.
	 */
	Set<String> getFinishedSplits();
}

Anchor
RecordEmitter
RecordEmitter
RecordEmitter

Code Block
languagejava
titleRecordEmitter
linenumberstrue
/**
 * Emit a record to the downstream.
 *
 * @param <E> the type of the record emitted by the {@link SplitReader}
 * @param <T> the type of records that are eventually emitted to the {@link SourceOutput}.
 * @param <SplitStateT> the mutable type of split state.
 */
public interface RecordEmitter<E, T, SplitStateT> {

	/**
	 * Process and emit the records to the {@link SourceOutput}. A few recommendations to the implementation
	 * are following:
	 *
	 * <ul>
	 * 	<li>The method maybe interrupted in the middle. In that case, the same set of records will be passed
	 * 	to the record emitter again later. The implementation needs to make sure it reades
	 * 	<li>
	 * </ul>
	 *
	 * @param element The intermediate element read by the SplitReader.
	 * @param output The output to which the final records are emit to.
	 * @param splitState The state of the split.
	 */
	void emitRecord(E element, SourceOutput<T> output, SplitStateT splitState) throws Exception;
}

...