Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The implementation of the index feature is one of the parts in the Flink Job DAG. Flink Stateful API can provide the ability of state management. We can store the index via Flink stateful API. From a low-level abstraction, in unbounded streaming, window is a mechanism that split the unbounded stream into bounded stream. We can use the window in Flink to mapping the micro-batch(RDD) in Spark.

Indexing as a step is one of the steps in the writing process is which exists in the context of computation and is closely related to our computation engine. Therefore, the implementation of the existing indexes also needs to give corresponding implementations for different computation engines. The whole class diagram is shown below:

Image Added


HoodieIndex, as a public interface, should be refactored into engine-independent classes. We can generalize Spark-related types. Like this:

...