Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this design, two granularities of blocked resources are supported: task managers and nodes. A record of blocklist information is called a blocked item, which is generally generated by the scheduler according to the exception of the tasks. These blocked items will be recorded in a special component and affect the resource allocation of Flink clusters. However,the blocklist blocked items are not permanent, there will be a timeout for it. Once an item times out, it will be removed, and the resource will become available again. The overall structure of the blocklist mechanism is shown in the figure below. 

...

Code Block
titleBlocklistedItem
/**
 * This class represents a blocked item.
 *
 * @param <ID> Identifier of the blocked item.
 */
public abstract class BlockedItem<ID> {
    public BlockedItemType getType();

    public long getTimestamp();

    public BlocklistActionBlockAction getAction();

    public Throwable getCause();

    public abstract ID getIdentifier();
}

/** This class represents a blocked node. */
public class BlockedNode extends BlockedItem<String> {
}

/** This class represents a blocked task manager. */
public class BlockedTaskManager extends BlockedItem<ResourceID> {
}

...

  1. Generate new blocked items by notifying the exception to the BlocklistStrategy.
  2. Add the new blocked items to the BlocklistTracker. 
  3. Synchronize the new blocked items to RM.
  4. Perform blocklist actions block actions on the resources via the BlocklistContext.

...

BlocklistStrategy is the component responsible for generating blocklist blocked items according to the exceptions and their locations notified by Scheduler. We can introduce different BlocklistStrategy implementations to adapt to different scenarios. In the first version, we will introduce a no-op implementation as default implementation. In the future, we will introduce a configurable blocklist strategy and plugin mechanism to load user-defined blocklist strategy implementations, details will be described in Future improvements.

...

BlocklistTracker is the component responsible for tracking blocklist blocked items. The tracker will regularly remove timeout blocked items.

...

BlocklistContext is the component responsible for performing blocklist block actions on SlotPool, the details will be described in SlotPool.

Code Block
titleBlocklistContext
public interface BlocklistContext {
    /** Perform the newly added or updated blocklistblocked items on resources. */
    void blocklistResources(Collection<BlockedItem<?>> newlyAddedOrUpdatedItems);
}
   

...

Code Block
titleBlocklistHandler & JobMasterBlocklistHandler
public interface BlocklistHandler extends BlocklistTracker {
    /** Add a new blocked node. */
    void blockNode(String nodeId, BlacklistActionBlockAction action, Throwable cause);

    /** Add a new blocked task manager. */
    void blockTaskManager(ResourceID taskManagerId, BlacklistActionBlockAction action, Throwable cause);
 }

public interface JobMasterBlocklistHandler extends BlocklistHandler {
}

...

ResourceManagerDriver uses following APIs to tell external resource managers about the information of blocklist blocked nodes:

  1. Yarn: AMRMClient#updateBlacklist 
  2. Kubernetes: NodeAffinity

...

Add a REST API to obtain blocklist information. Each request will return all current blocklist blocked items, which are obtained from ResourceManagerBlocklistHandler.

...