THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
3. Configuration
3.1 General options
Key | Default Value | Required | Comment |
---|---|---|---|
fenodes | -- | Y | Doris FE http address, multiple addresses are supported, separated by commas |
benodes | -- | N | Doris BE http address, multiple addresses are supported, separated by commas. refer to #187 |
jdbc-url | -- | N | jdbc connection information, such as: jdbc:mysql://127.0.0.1:9030 |
table.identifier | -- | Y | Doris table name, such as: db.tbl |
username | -- | Y | username to access Doris |
password | -- | Y | Password to access Doris |
auto-redirect | false | N | Whether to redirect StreamLoad requests. After being turned on, StreamLoad will be written through FE, and BE information will no longer be displayed. At the same time, it can also be written to SelectDB Cloud by turning on this parameter. |
doris.request.retries | 3 | N | Number of retries to send requests to Doris |
doris.request.connect.timeout.ms | 30000 | N | Connection timeout for sending requests to Doris |
doris.request.read.timeout.ms | 30000 | N | Read timeout for sending requests to Doris |
3.2 Source options
Key | Default Value | Required | Comment |
---|---|---|---|
doris.request.query.timeout.s | 3600 | N | The timeout time for querying Doris, the default value is 1 hour, -1 means no timeout limit |
doris.request.tablet.size | Integer. MAX_VALUE | N | The number of Doris Tablets corresponding to a Partition. The smaller this value is set, the more Partitions will be generated. This improves the parallelism on the Flink side, but at the same time puts more pressure on Doris. |
doris.batch.size | 1024 | N | The maximum number of rows to read data from BE at a time. Increasing this value reduces the number of connections established between Flink and Doris. Thereby reducing the additional time overhead caused by network delay. |
doris.exec.mem.limit | 2147483648 | N | Memory limit for a single query. The default is 2GB, in bytes |
doris.deserialize.arrow.async | FALSE | N | Whether to support asynchronous conversion of Arrow format to RowBatch needed for flink-doris-connector iterations |
doris.deserialize.queue.size | 64 | N | Asynchronous conversion of internal processing queue in Arrow format, effective when doris.deserialize.arrow.async is true |
doris.read.field | -- | N | Read the list of column names of the Doris table, separated by commas |
doris.filter.query | -- | N | The expression to filter the read data, this expression is transparently passed to Doris. Doris uses this expression to complete source-side data filtering. For example age=18. |
3.3 Lookup Join options
Key | Default Value | Required | Comment |
---|---|---|---|
lookup.cache.max-rows | -1 | N | The maximum number of rows in the lookup cache, the default value is -1, and the cache is not enabled |
lookup.cache.ttl | 10s | N | The maximum time of lookup cache, the default is 10s |
lookup.max-retries | 1 | N | The number of retries after a lookup query fails |
lookup.jdbc.async | false | N | Whether to enable asynchronous lookup, the default is false |
lookup.jdbc.read.batch.size | 128 | N | Under asynchronous lookup, the maximum batch size for each query |
lookup.jdbc.read.batch.queue-size | 256 | N | The size of the intermediate buffer queue during asynchronous lookup |
lookup.jdbc.read.thread-size | 3 | N | The number of jdbc threads for lookup in each task |
3.4 Sink options
Key | Default Value | Required | Comment |
---|---|---|---|
sink.label-prefix | -- | Y | The label prefix used by Stream load import. In the 2pc scenario, global uniqueness is required to ensure Flink's EOS semantics. |
sink.properties.* | -- | N | Import parameters for Stream Load. For example: 'sink.properties.column_separator' = ', ' defines column delimiters, 'sink.properties.escape_delimiters' = 'true' special characters as delimiters, '\x01' will be converted to binary 0x01 JSON format import 'sink.properties.format' = 'json' 'sink.properties. read_json_by_line' = 'true' Detailed parameters refer to here. |
sink.enable-delete | TRUE | N | Whether to enable delete. This option requires the Doris table to enable the batch delete function (Doris 0.15+ version is enabled by default), and only supports the Unique model. |
sink.enable-2pc | TRUE | N | Whether to enable two-phase commit (2pc), the default is true, to ensure Exactly-Once semantics. For two-phase commit, please refer to here. |
sink.buffer-size | 1MB | N | The size of the write data cache buffer, in bytes. It is not recommended to modify, the default configuration is enough |
sink.buffer-count | 3 | N | The number of write data buffers. It is not recommended to modify, the default configuration is enough |
sink.max-retries | 3 | N | Maximum number of retries after Commit failure, default 3 |
sink.use-cache | false | N | In case of an exception, whether to use the memory cache for recovery. When enabled, the data during the Checkpoint period will be retained in the cache. |
sink.enable.batch-mode | false | N | Whether to use the batch mode to write to Doris. After it is enabled, the writing timing does not depend on Checkpoint. The writing is controlled through the sink.buffer-flush.max-rows/sink.buffer-flush.max-bytes/sink.buffer-flush.interval parameter. Enter the opportunity. After being turned on at the same time, Exactly-once semantics will not be guaranteed. Uniq model can be used to achieve idempotence. |
sink.flush.queue-size | 2 | N | In batch mode, the cached column size. |
sink.buffer-flush.max-rows | 50000 | N | In batch mode, the maximum number of data rows written in a single batch. |
sink.buffer-flush.max-bytes | 10MB | N | In batch mode, the maximum number of bytes written in a single batch. |
sink.buffer-flush.interval | 10s | N | In batch mode, the interval for asynchronously refreshing the cache |
sink.ignore.update-before | true | N | Whether to ignore the update-before event, ignored by default. |
4. Datatype Mapping
Doris Type | Flink Type |
---|---|
NULL_TYPE | NULL |
BOOLEAN | BOOLEAN |
TINYINT | TINYINT |
SMALLINT | SMALLINT |
INT | INT |
BIGINT | BIGINT |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
DATE | DATE |
DATETIME | TIMESTAMP |
DECIMAL | DECIMAL |
CHAR | STRING |
LARGEINT | STRING |
VARCHAR | STRING |
STRING | STRING |
Bitmap | Unsupported datatype |
HLL | Unsupported datatype |
...