Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Data consistency of ETL Topology is our first phase of work. After completing this part, we plan to promote the capacity building and improvement of Flink + Table Store in future, mainly including the following aspects.

  1. Support data consistency semantics. As mentioned above, we need to implement "Timestamp Barrier" to support full semantics data consistency. 
  2. Materialized View in SQL. Next, we hope to introduce materialized view syntax into Flink to improve user interaction experience. Queries can also be optimized based on materialized views to improve performance.

  3. Improve MetaService capabilities. ManagerService is a single point in the system, and it should supports failover. In the other way, MetaService supports managing Flink ETL jobs and tables in Table Store, accessed by other computing engines such as Spark and being an agent of Hive Metastore later.

    Improve data consistency semantics. As mentioned above, we need to implement "Timestamp Barrier" to support full semantics data consistency instead of "Aligned Checkpoint" in the first stage

    .

     

  4. Improve OLAP performance. We have created issues in FLINK-25318] Improvement of scheduler and execution for Flink OLAP to manage improvement of OLAP in Flink. At the same time, we hope to continue to enhance the online query capability of Table Store and improve the OLAP performance of Flink + Table Store.

  5. Improvement of data real-time. At present, our consistency design is based on Flink checkpoint mechanism and supports minute level delay. In the future, we hope we hope to support second level or even millisecond level data real-time on the premise of ensuring data consistency, which requires continuous optimization in computing and storage.

By promoting the above optimization and implementation, we hope that Flink + Table Store can support the full StreamingWarehouse capability. Users can create materialized views and execute OLAP queries in the system, just like using databases and data warehouses, and output data to the application layer (such as KV) as required.