...
Unreused connections/threads, extra overhead of resource operations initialization and release:
Frequently create/detroy destroy new HA(ZK、K8S) connections for leader retrieval
Frequently open/close Netty channel for each request
Frequently create/destroy ThreadPool in
RestClusterClient
andRestClient
Unreused instances, extra GC overhead:
For each operation, Flink Client creates a lot of new instances like:
ClusterDescriptor
,RestClusterClient
,ClientHighAvailabilityServices
andRestClient
Concurrency bottlenecks:
One global
ObjectMapper
instance for data serialization/deserialization for all http requests and responses
Unnecessary workload:
For example: fixed collect retry interval(100 ms) in
CollectResultFetcher
to fetch result from Flink Cluster. This retry operation could be very resource consuming when executing under high concurrency.
...
To verify this optimization, we internally designed test scenarios for validation.
The agent process will continuously submit sql queries to the SQL Gateway service using different concurrency (1 concurrency, 32 concurrency and 64 concurrency) and monitor the end-to-end Latency.
Test Queries
...