Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Batch reading of Doris data is currently a bounded stream, usually used for data synchronization or joint analysis with other data sources.
1. First, the query will be spliced according to the query and sent to Doris to obtain the query plan.
2. The above Response will return the Tablet and BE node information where the query is located.
3. Use taskmanager to query specific tablet information concurrently

Configurations:

1.2. LOOKUP JOIN

For the scenario where the dimension table is in Doris, lookup join is performed, and JDBC is mainly used for querying.Configurations:

2. Sink

Writing on the Doris side is mainly done through the Stream Load API , At the same time, Doris Sink will provide two writing methods

...

Stream Load provides two-phase commit api, refer to https://github.com/apache/doris/issues/7141
Combined with Stream Load's two-phase commit, end-to-end data consistency can be achieved based on Flink's two-phase commit.

Configurations:


2.2. Save batch writing

Streaming writing is submitted based on the checkpoint method and is strongly bound to the checkpoint, that is, the data visibility is the checkpoint interval. However, in some scenarios, the delay of user data needs to be decoupled from the checkpoint interval.

...

Note:that batch writing provides at-least-once semantics and does not guarantee Exactly-Once semantics. However, it can be combined with Doris' primary key table to achieve Exactly-Once.


3. Configuration

3.1 General options


3.

...

2 Source options


3.3 Lookup Join options


3.4 Sink options



4

Configurations:

...

. Datatype Mapping

Doris TypeFlink Type
NULL_TYPENULL
BOOLEANBOOLEAN
TINYINTTINYINT
SMALLINTSMALLINT
INTINT
BIGINTBIGINT
FLOATFLOAT
DOUBLEDOUBLE
DATEDATE
DATETIMETIMESTAMP
DECIMALDECIMAL
CHARSTRING
LARGEINTSTRING
VARCHARSTRING
STRINGSTRING
BitmapUnsupported datatype
HLLUnsupported datatype

...