Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
-- Query upstream table snapshot for given table and snapshot
SELECT S.database, S.table, S.snapshot_id FROM
    source_snapshot_lineage S
    JOIN sink_snapshot_lineage T
    ON S.job=T.job AND S.barrier_id=T.barrier_id
    where T.`database`='myDatabase' and T.`table`='myTable' and T.snapshot_id=123;

-- Query downstream table snapshot for given table and snapshot
SELECT T.database, T.table, T.snapshot_id FROM
    source_job_lineage S
    JOIN sink_job_lineage T
    ON S.job=T.job AND S.barrier_id=T.barrier_id
    where S.`database`='myDatabase' and S.`table`='myTable' and S.snapshot_id=123;

...

1. Concurrently Writing In Lineage Tables

There will be multiple jobs concurrently writing records to data lineage tables in System Database. To avoid conflicts, the data lineage tables such as source_snapshot_lineage and sink_snapshot_lineage use job name as partition field. The source reader operator and commit operator for each Flink ETL job writes data to different partitions independently.

draw.io Diagram
bordertrue
diagramNamedata lineage partition
simpleViewerfalse
width
linksauto
tbstyletop
lboxtrue
diagramWidth981
revision2

2. Snapshots In Lineage Tables


3. Actions For Lineage Tables

Tables in system database are read-only and users can query data from system tables by SQL just like a regular table. But users can not alter the system tables and update the data in them. We provide Flink actions for users to delete data from system tables as follows

action

argument

note

delete-table-lineage--job <job-name>: specify name of the job.delete table lineage created by the given job.
delete-data-lineage--job <job-name>: specify name of job.delete data lineage created by the given job.


We introduce a new option table-lineage for Paimon catalog, users can set this option when they create a new catalog.

...