Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. type: The blocklisted item type, TASK_MANAGER or NODE
  2. timestamp: The timestamp for creating this item, will be used to check timeout.
  3. cause: The cause for creating this item.
  4. identifier: The identifier of the blocklisted task manager or node.
  5. action: The action when a task manager/node is marked as blocklisted, including:
    1. MARK_BLOCKLISTED: Just mark the task manager or node as blocklisted. Future slots should not be allocated from the blocklisted task manager or node. But slots that are already allocated will not be affected.
    2. MARK_BLOCKLISTED_AND_EVACUATE_TASKS: Mark the task manager or node as blocklisted, and evacuate all tasks on it. Evacuated tasks will be restarted on non-blocklisted task managers.
Code Block
titleBlacklistedItemBlocklistedItem
/**
 * This class represents a blocklisted item.
 *
 * @param <ID> Identifier of the blocklisted item.
 */
public abstract class BlocklistedItem<ID> {
    public BlocklistedItemType getType();

    public long getTimestamp();

    public BlocklistAction getAction();

    public Throwable getCause();

    public abstract ID getIdentifier();
}

/** This class represents a blocklisted node. */
public class BlocklistedNode extends BlocklistedItem<String> {
}

/** This class represents a blocklisted task manager. */
public class BlocklistedTaskManager extends BlocklistedItem<ResourceID> {
}

...

BlocklistStrategy is the component responsible for generating blocklist items according to the exceptions and their locations notified by Scheduler. We can introduce different BlocklistStrategy implementations to adapt to different scenarios. In the first version, we will introduce a no-op implementation as default implementation. In the future, we will introduce a configurable blocklist strategy and plugin mechanism to load user-defined blocklist strategy implementations, details will be described in Future improvements.

Code Block
titleBlacklistStrategyBlocklistStrategy
public interface BlocklistStrategy {
    /**
     * Generate blocklisted items according to the abnormal task's location and cause.
     *
     * @param locations the abnormal tasks’ locations.
     * @param cause the cause of blocklisted items.
     * @param timestamp the create timestamp of blocklisted items.
     * @return the generated blocklisted items.
     */
    Collection<BlocklistedItem<?>> generateBlocklistedItems(Collection<TaskManagerLocation> locations, Throwable cause, long timestamp);
}

...

BlocklistTracker is the component responsible for tracking blocklist items. The tracker will regularly remove timeout blocklisted items.

Code Block
titleBlacklistTrackerBlocklistTracker
public interface BlocklistTracker {
    /** Starts the blocklist tracker. */
    void start(ComponentMainThreadExecutor mainThreadExecutor);

    /**
     * Add new blocklisted items or update existing items.
     *
     * @param items The items to add or update
     * @return Newly added or updated items.
     */
    Collection<BlocklistedItem<?>> addNewBlocklistedItems(Collection<BlocklistedItem<?>> items);

    /** Returns whether the given task manager is blocklisted. */
    boolean isBlocklistedTaskManager(ResourceID resourceID);

    /** Get all blocklisted nodes. */
    Set<String> getBlocklistedNodes();

    /** Close the blocklist tracker. */
    void close();
}
     

...

BlocklistContext is the component responsible for performing blocklist actions on SlotPool, the details will be described in SlotPool.

Code Block
titleBlacklistContextBlocklistContext
public interface BlocklistContext {
    /** Perform the newly added or updated blocklist items on resources. */
    void blocklistResources(Collection<BlocklistedItem<?>> newlyAddedOrUpdatedItems);
}
   


Code Block
titleBlacklistHandler BlocklistHandler & JobMasterBlacklistHandlerJobMasterBlocklistHandler
public interface BlocklistHandler extends BlocklistTracker {
}

public interface JobMasterBlocklistHandler extends BlocklistHandler {

    /**
     * Notify an exception that may generate blocklist items.
     *
     * @param locations locations of the exception
     * @param cause the exception
     */
    void notifyException(Collection<TaskManagerLocation> locations, Throwable cause);
}

...

ResourceManagerBlocklistHandler is a new component introduced in RM for the blocklist mechanism. It has only one sub-component: BlocklistTracker, which is responsible for managing cluster-level blocklisted items.

Code Block
titleResourceManagerBlacklistHandlerResourceManagerBlocklistHandler
public interface ResourceManagerBlocklistHandler extends BlocklistHandler {
}

...

  1. Once a few blocklisted items are newly added (or updated) to the JobMasterBlocklistHandler, RM will be notified of these items via ResourceManagerGateway#notifyNewBlocklistedItems.
  2. When RM receives the blocklisted items notified by a JM, it will add them into ResourceManagerBlocklistHandler, and notify all JMs of the successfully added (or updated) items through JobMasterGateway#notifyNewBlocklistedItems. 
  3. Similarly, when JM receives the blocklisted items notified by RM, it will also add them to JobMasterBlocklistHandler.
Code Block
titleBlacklistListenerBlocklistListener
public interface BlocklistListener {

    /** Notify new blocklisted items. */
    void notifyNewBlocklistedItems(Collection<BlocklistedItem<?>> newItems);
}

public interface JobManagerGateway extends BlocklistListener {
    //...
}

public interface ResourceManagerGateway extends BlocklistListener {     
    //...
}  

...