Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The customer has a large amount of order transaction data and expects it to automatically flow into the offline table every time the transaction data is available, and batch task statistics will be performed at 12 o'clock every day
Combined with the flink computing engine, we can create AppendOnly Table with negative bucket(regardless of bucket):

Code Block
languagesql
themeDJango
titleCreateTable
CREATE TABLE Orders ( order_id INT, order_type STRING, `date` TIMESTAMP, price INT, number INT ) PARTITIONED BY (order_type) WITH ( 'write-mode' = 'append-only', 'bucket' = '-1' );


There isn’t any property about bucket.

...

The sql INSERT INTO Orders SELECT * FROM OrderSource create a dag like below:


Image RemovedImage Added

The writes all belong to one bucket, they could insert into one bucket parallelly. So we don't worry about the performance while inserting.

...

At the computing engine level, we build the following topology when writing in real time:


Image RemovedImage Added

1. In the prepareSnapshot phase, the writer flushes the new file, the compaction coordinator receives the new file, and the compaction coordinator reads the last delta file from the latest snapshot and adds it to the restored files. 

...