...
In high concurrency scenarios, the CPU usage of the Flink cluster was reduced by 20% with no performance degradation in QPS by using proper retry strategy configuration.
initial-delay
200 ms
increment
100 ms
max-delay
700 ms
Baseline(128 Concurrency) | Experiment(128 Concurrency) | |||
---|---|---|---|---|
Retry Strategy | DEFAULT_RETRY_MILLIS=100 ms |
collect-strategy.type
incremental-delay
Optimized retry strategy |
Join QPS | 47 | 48 | ||
JM CPU | 17.3 | 14.5 | ||
TM CPU | 9.0 | 7.2 |
Changed Public Interfaces
DeploymentOptions
Option Name | Default Value | Description |
execution.interactive-client | false | Enabling this means the Flink client will be used in an interactive scenario. It should reuse all the necessary resources to improve interactive performance. Notice: This option will only take effect when Flink is deployed in session-mode. |
CollectOptions
Option Name | Default Value | Description |
collect-strategy.type
fixed-delay
CollectResultFetcher
. The default RetryStrategy is
FixedDelayCollectStrategy
, which means CollectResultFetcher
will keep fetch results from sinkoperator at fixed intervals.collect-strategy. |
Integer.MAX_VALUE
Config options for FixedRetryStrategy
, the strategy will retries at a fixed delay.
collect-strategy.fixed-delay.delay
100 ms
exponential-delay.initial-backoff | 100 ms | Config options for |
| |
collect-strategy.exponential-delay.max-backoff | 1 s |
collect-strategy.exponential-delay. |
Integer.MAX_VALUE
collect-strategy.incremental-delay.initial-delay
100 ms
IncrementalDelayRetryStrategy
, the strategy will increase the delay with a fixed increment.backoff-multiplier | 1.0 |
collect-strategy. |
100 ms
collect-strategy.incremental-delay.max-delay
1 s
exponential-delay.attempts | Integer.MAX_VALUE |
Proposed Change
Overview
- Reuse
ClusterDescriptor
when interacting with a dedicated session cluster - Reuse
ClientHighAvailabilityServices
for leader retrieval with the same session cluster - Reuse
RestClusterClient
when sending requests to the same session cluster - Reuse the HTTP connections between JDBC Driver to SQL Gateway and Flink Client to Flink Cluster with the same target address
- Use
ThreadLocal<ObjectMapper>
instead of a globalObjectMapper
instance - Use a configurable retry strategy in
CollectResultFetcher
to reduce unnecessary workload
...