Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is an unfinished paper. I am writing. After finish I will send e-mail to dev@flink.apache.org.


Table of Contents

Status

Current stateUnder Discussion

...

I will implement (Job, Host) blacklist for speculative execution feature. In order to implement

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-11000
  feiendly in the future, my interface also suit other blacklist descripted above.

Classes and Interfaces of (JobVertex Host) blacklist are:

Init black list 

where register?

The blacklist module is a thread that maintains the black machines of this job and removes expired elements periodically.Each element in blacklist contains IP and timestamp. The timestamp is used to decide whether the elements of the blacklist is expired or not. 



Classes and Interfaces of (JobVertex Host) blacklist

(1)Abstract Class BlackList, each type of blacklist will extends it.

Code Block
languagejava
titleAbstract Class BlackList
public abstract class BlackList implements Serializable {
	/** Black list configuration. */
	protected final BlackListConfig blackListConfig;

	public BlackList(BlackListConfig blackListConfig) {
        this.blackListConfig = blackListConfig;
    }

    /**
     * Remove time out black list records.
     * @param timeout Time out time.
     * @return Minimum timestamp of black list records.
     */
    public abstract long removeTimeoutBlackList(long timeout);

    /** Clear black list. */
    public abstract void clear();

	/** Is black list empty. */
    public abstract boolean isEmpty();
}


(2)JobBlackList, the job-level black list.

Code Block
languagejava
titleThe job-level black list
public class JobBlackList extends BlackList {

	/** The list of the black ip. This list is mainly for time out checking. */
    private final Queue<BlackListRecord> jobToIpBlackListRecords;

	/** The set of the black ip. This set is mainly for black host filter. */
    private final Set<String> jobBlackIpSet;

	public JobBlackList(BlackListConfig blackListConfig) {
        super(blackListConfig);
        this.jobToIpBlackListRecords = new ConcurrentLinkedQueue<>();
        this.jobBlackIpSet = Collections.newSetFromMap(new ConcurrentHashMap<>());
    }
	
	/** Add a ip to this black list. */
	public boolean addIpToBlackList(String ip) {}

	public Set<String> getAllBlackListIpSet() {}

	/** clear (job, ip) blacklist. */
	@Override
    public void clear() {}

	@Override
    public long removeTimeoutBlackList(long timeout) {}

	@Override
    public boolean isEmpty() {}

	/** Whether the given host has been added to black list. */
	public boolean containsIp(String ip) {}
}


(3)BlackListRecord, the item of blackList

Code Block
languagejava
titleThe item of black list
public class BlackListRecord implements Serializable {
	/**
 	 * Black list record which stores the black host ip and
 	 * the time stamp when this record is added.
 	 */
	public class BlackListRecord implements Serializable {}

	/** The black host ip. */
    private final String ip;

    /** The time stamp when this black list record is added. */
    private final long timeStamp;
}


(4)A abstract Class called BlackListTracker, a thread that maintain blacklist info

Code Block
languagejava
titlea thread that maintain blacklist info
public abstract class BlackListTracker implements Runnable {

	/** The executor to run the time out checking task. */
    private final ScheduledExecutor executor;
	
	/** The black list configuration. */
    protected final BlackListConfig blackListConfig;

	/** The black list timeout check future, will be canceled when black black list destroyed. */
    private AtomicReference<ScheduledFuture> timeoutCheckFuture;

	public BlackListTracker(ScheduledExecutor executor, BlackListConfig blackListConfig) {
        Preconditions.checkNotNull(blackListConfig);
        this.executor = executor;
        this.blackListConfig = blackListConfig;
        this.timeoutCheckFuture = new AtomicReference<>();
    }

	/**
     * Given the minimum time stamp of black list record. The function schedules a task to remove the black list
     * record when it got timeout.
     * @param minTimeStamp The minimum time stamp of black list record.
     */
    public void scheduleTimeOutCheck(long minTimeStamp) {}

	public boolean isBlackListEnabled() {
        return blackListConfig.isBlackListEnabled();
    }

	/** Clear the black list. */
    public abstract void clearBlackList();

    /** Get all black list ip. */
    public abstract Set<String> getAllBlackListIp();

	/** Clear the black list and cancel the timeout check task. */
    public void destroy() {}
}


(5)JobBlackListSpeculativeListener, event listener

Code Block
languagejava
titleListener to be notified when speculative execution happened.
/** Listener to be notified when speculative execution happened. */
public interface JobBlackListSpeculativeListener {
    /**
     * When a speculative execution happened, the listener will be notified.
     * @param ip the ip
     */
    void onSpeculativeExecutionHappened(String ip);
}


(5)JobBlackListTracker, per-job blackList tracker

Code Block
languagejava
titleper-job blackList tracker
public class JobBlackListTracker extends BlackListTracker implements JobBlackListSpeculativeListener {

	/** The black list of this job. */
    private final JobBlackList jobBlackList;

	public JobBlackListTracker(ScheduledExecutor executor, BlackListConfig blackListConfig) {
        super(executor, blackListConfig);
        this.jobBlackList = new JobBlackList(blackListConfig);
        scheduleTimeOutCheck(blackListConfig.getBlackListTimeOutInMillis());
    }

	@Override
    public void clearBlackList() {
        jobBlackList.clear();
    }

	@Override
    public Set<String> getAllBlackListIp() {
        if (blackListConfig.isBlackListEnabled()) {
            return jobBlackList.getAllBlackListIpSet();
        }
        return Collections.emptySet();
    }

	/** The time out checking task to be scheduled. /
    @Override
    public void run() {
        long minTimeStamp = jobBlackList.removeTimeoutBlackList(blackListConfig.getBlackListTimeOutInMillis());
        scheduleTimeOutCheck(minTimeStamp);
    }

    @Override
    public void onSpeculativeExecutionHappened(String ip) {
        if (blackListConfig.isBlackListEnabled()) {
            jobBlackList.addIpToBlackList(ip);
        }
    }
}


Init black list 

In DefaultExecutionGraph I will add a member-variable jobBlackListSpeculativeListeners and when after create ExecutionGraph jobBlackListTracker will be add in jobBlackListSpeculativeListeners. JobBlackListTracker.onSpeculativeExecutionHappened() will be called when the SpeculativeExecution detected a long tail task and start notify scheduler to scheduling a speculative execution.

Code Block
languagejava
titleExecutionGraph
public interface ExecutionGraph extends AccessExecutionGraph {
	void registerJobBlackListSpeculativeListener(JobBlackListTracker jobBlackListTracker);
}


Code Block
languagejava
titleDefaultExecutionGraph
public class DefaultExecutionGraph implements ExecutionGraph, InternalExecutionGraphAccessor {

	private final List<JobBlackListSpeculativeListener> jobBlackListSpeculativeListeners;
	
	public DefaultExecutionGraph() {
		this.jobBlackListSpeculativeListeners = new ArrayList<>();
	}
	
	@Override
    public void registerJobBlackListSpeculativeListener(JobBlackListTracker listener) {
        if (listener != null) {
            jobBlackListSpeculativeListeners.add(listener);
        }
    }

	@Override
    public List<JobBlackListSpeculativeListener> getJobBlackListSpeculativeListeners() {
        return jobBlackListSpeculativeListeners;
    }
}


Add element to black list

We have implemented a machine-dimensional blacklist per job. The machine IP was added in the blacklist when an execution is recognized as a long-tail execution.

...

Manage middle ResultPartition 


Table of Contents
Metrics

// 必须要多少execution完成了,才开始预测执行的metrics
private AtomicInteger minFinishedForSpeculationThresholdMetrics;
// 已经完成的execution数的metrics
private AtomicInteger finishedExecutionCountMetrics;
// 有多少execution发生了预测执行的metrics
private Counter speculativeExecutionCountMetrics;
// execution运行时间维度,发生预测执行的阈值
private AtomicDouble speculativeThresholdOfTime;
// 每个ExecutionVertex中原execution的运行时间的metrics
private Map<String, AtomicDouble> executionIndex2RunningTimespan;
// 如果最快结束的execution是预测执行的execution,那么对相应的metrics进行汇报
private Counter speculativeExecutionFastestFinishedCountMetrics;


Web UI

Limitations

(1)JobType is Batch.

...