Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Logically we could not restart the finished tasks on failover. We could only restart those tasks who have operators not fully finished. Specially, if all the source operators have been finished, the remaining operator would directly start the process of finishing. For the However, for the first version, we would still restart all the tasks , and skip the finished tasks would start finishing normallyexecution of fully finished operators. This simplifies the implementation of the task side and scheduler. 

...

Code Block
linenumberstrue
each_jv:
for each job vertex JV;do
    if all tasks of JV finished;then
        continue;
	endif

	// The fast check
	for each input job edge IJE;do
		if (IJE is AlL_TO_ALL and some tasks are running) or (IJE is POINTWISE and all tasks are running);then
			continue each_jv;
		endif
	endfor

	for each task of JV;do
		if task is running and no precedent tasks are running;then
			add this taskhas_running_precedent_tasks = false;
        for all the precedent tasks PT connected via POINTWISE edges;do
			if PT is running;then
				has_running_precedent_tasks = true;
				break;
			endif;
		done
		if task is running and !has_running_precedent_tasks;then
			mark this task as need triggering;
		endif
	endfor
endfor