Authors: Aihua Li, George Chen, Yu Li
Status
Current state: "Under Discussion"
Discussion thread: https://lists.apache.org/thread.html/5aac294120d93b418bd6900eeb2416f4f49010241a847830e6ea2ff1@%3Cdev.flink.apache.org%3E
JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)
Released: <Flink Version>
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Since there's no widely accepted performance testing method in the stream-computing field at this moment, we've built an end-to-end performance-testing framework for Flink, which will collect delay and throughput of test jobs. Those collected metrics will indicate the engine's performance directly and can be used for finding performance regression by comparing data among different engine versions.
Goal
We propose to include at least 3 categories of end-to-end performance test suites, including:
- Test suite for basic operations
- Test suite for state backend
- Test suite for shuffle service
And we need to monitor the result in two main aspects:
- Job performance, mainly include throughput and latency
- Hardware consumption, mainly include CPU/Memory/Network/Disk consumption
Roadmap
We plan to split the implementation into 3 phases:
- Add test suite for basic operations, and a visible WebUI to check the throughput and latency data, pretty much like our existing flink speed center.
- Add test suite for state backend, and more monitoring on hardware metrics.
- Add test suite for shuffle service.
Design
The detailed design of each test suite will be illustrated in this section.
Test suite for basic operations
In this test suite we will use the default backend (heap) and shuffle service, to make sure of no regression on the basic end-to-end performance of flink job.
Job Topology
Instead of simulating each user scenario, we just choose the most basic topologies for performance test, i.e. SingleInputOperator and TwoInputOperator. These two basic topologies can form any complicated topologies through combination and deformation. Figure 1 and figure 2 show these two basic topologies:
Figure 1. One Input Topology
Figure 2. Two Input Topology
Test Scenarios
The following dimensions are taken into account when setting the test scenarios:
Topology | Logical Attributes of Edges | Schedule Mode | Checkpoint Mode |
OneInput | Broadcast | Lazy from Source | ExactlyOnce |
TwoInput | Rescale | Eager | AtLeastOnce |
Rebalance | |||
KeyBy |
Test Job List
The above test scenarios could form 32 test jobs as shown below:
- OneInput + Broadcast + LazyFromSource + ExactlyOnce
- OneInput + Rescale + LazyFromSource + ExactlyOnce
- OneInput + Rebalance + LazyFromSource + ExactlyOnce
- OneInput + KeyBy + LazyFromSource + ExactlyOnce
- OneInput + Broadcast + Eager + ExactlyOnce
- OneInput + Rescale + Eager + ExactlyOnce
- OneInput + Rebalance + Eager + ExactlyOnce
- OneInput + KeyBy + Eager + ExactlyOnce
- OneInput + Broadcast + LazyFromSource + AtLeastOnce
- OneInput + Rescale + LazyFromSource + AtLeastOnce
- OneInput + Rebalance + LazyFromSource + AtLeastOnce
- OneInput + KeyBy + LazyFromSource + AtLeastOnce
- OneInput + Broadcast + Eager + AtLeastOnce
- OneInput + Rescale + Eager + AtLeastOnce
- OneInput + Rebalance + Eager + AtLeastOnce
- OneInput + KeyBy + Eager + AtLeastOnce
- TwoInput + Broadcast + LazyFromSource + ExactlyOnce
- TwoInput + Rescale + LazyFromSource + ExactlyOnce
- TwoInput + Rebalance + LazyFromSource + ExactlyOnce
- TwoInput + KeyBy + LazyFromSource + ExactlyOnce
- TwoInput + Broadcast + Eager + ExactlyOnce
- TwoInput + Rescale + Eager + ExactlyOnce
- TwoInput + Rebalance + Eager + ExactlyOnce
- TwoInput + KeyBy + Eager + ExactlyOnce
- TwoInput + Broadcast + LazyFromSource + AtLeastOnce
- TwoInput + Rescale + LazyFromSource + AtLeastOnce
- TwoInput + Rebalance + LazyFromSource + AtLeastOnce
- TwoInput + KeyBy + LazyFromSource + AtLeastOnce
- TwoInput + Broadcast + Eager + AtLeastOnce
- TwoInput + Rescale + Eager + AtLeastOnce
- TwoInput + Rebalance + Eager + AtLeastOnce
- TwoInput + KeyBy + Eager + AtLeastOnce
Result Check
In this initial stage we will only monitor and display job throughput and latency.
Test suite for state backend
This test suite is mainly for making sure the performance of IO intensive applications. We plan to implement this at stage 2, as well as adding more monitoring on hardware.
Test suite for shuffle service
This test suite is mainly for making sure the performance of batch applications. We plan to implement this at stage 3.
Implementation
The test cases are written in java and scripts in python. We propose a separate directory/module in parallel with flink-end-to-end-tests, fwith the name of flink-end-to-end-perf-tests.