Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here we propose the speculative execution strategy [FLINK-10644] to  to handle the problem. The basic idea is to run a copy of task on another node when the original task is identified to be long tail. The speculative task is executed in parallel with the original one and share the same failure retry mechanism. Once either task complete, the scheduler admits its output as the final result and cancel the other running one. I will introduce a blacklist module to schedule the long tail task on different machine from the original task. And modify FileOutputFormat.java to adapter speculative execution mechanism.

...

Blacklist is a kind of scheduling constraint. According to  JiraserverASF JIRAserverId5aa69414-a9e9-3523-82ec-879b028fb15bkeyFLINK-11000 description this is a bigger feature.

...

I will implement (Job, Host) blacklist for speculative execution feature. In order to implement JiraserverASF JIRAserverId5aa69414-a9e9-3523-82ec-879b028fb15bkeyFLINK-11000  feiendly in the future, my interface also suit other blacklist descripted above.The blacklist module is a thread that maintains the black machines of this job and removes expired elements periodically.Each element in blacklist contains IP and timestamp. The timestamp is used to decide whether the elements of the blacklist is expired or not. 


...