Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the following experiments, we had a simple topology (source → map → map → sleepy map → measure map), where each channel was a random shuffle. The sleepy map slept on average 0, 0.01, and 0.1 ms per record to induce backpressure. We compared a POC for adhoc spilling, a POC for continuous spilling, and the base commit in the master (1.11-SNAPSHOT). For each approach and sleep time, we measured 3 times and took the median while persisting on HDFS in an EMR cluster with 1 master and 5 slaves, each having 32 cores and 256 GB RAM, with a total parallelism of 160.


Image ModifiedImage ModifiedImage Modified


We can see that for the baseline the checkpoint duration somewhat correlates with the end-to-end latency as expected. For both POCs, the checkpoint duration, however, remained stable and even decreases for higher backpressure. Surprisingly, the checkpointing times for continuous spilling are higher than adhoc for lower sleep times. We suspect that we have additional backpressure for continuously write large amount of data. With increasing backpressure and thus decreasing data volume, continuous spilling reaches sub-seconds checkpointing times.

...