Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Snapshot File Scanner: Read the latest snapshots of tables and emit file changes to downstream Executors.
  2. The executor maintains files for specific buckets and provides query service.
  3. The address server collects all addresses of executors and registers the address to the Paimon table file system.

Image Added

How to Query

Users just need to get the Paimon table from the Catalog (need warehouse Path), and just create a TableQuery object, the TableQuery will do:

  1. Find the address server from the Paimon table file system.
  2. Connect the address server to get all executor addresses.
  3. Connect executors to lookup by key.

Image Added

Implementation

  1. Distributed: In the first version, we can launch this service in a separate Flink Job. The topology should just be a DAG.
  2. RPC: The RPC for Executor and Address server can be GRPC.
  3. TableQuery client: 
    1. Maintain address for Address Server and Executors. Retry to get a new address if there are some exceptions.
    2. Maintain connections for Address Server and Executors. Retry to get a new connection if there are some exceptions.
    3. User LookupLevels class to lookup, which already contains cache, IO, and disk management.
    4. Provide one key lookup and batch keys lookup.

...