...
We propose to introduce the blacklist mechanism to solve this problem. Blacklist is a mechanism to filter out problematic resources. Once a resource is judged to be abnormal, it will be blacklisted to avoid assigning tasks to it. We will introduce following two ways to specify blacklisted blacklist resources:
- Manually specify the blacklisted resources through REST API. When users find abnormal nodes/TMs, they can manually blacklist them inby in this way.
- Automatically detect abnormal resources and blacklist them. Users can specify a blacklist strategy , and Flink will automatically blacklist which identifies abnormal resources according to the strategyreceived exception and related locations.
Public Interfaces
We propose to introduce following configuration options for blacklist:
...
Code Block | ||
---|---|---|
| ||
public interface ResourceManagerGateway { CompletableFuture<BlacklistInfo> requestBlacklist(@RpcTimeout Time timeout); // ... } |
GET: http://{jm_rest_address:port}/blacklist
Request: {}
Response:
Code Block | ||
---|---|---|
| ||
{ /** This group only contains directly blacklisted task managers */ "blacklistedTaskManagers": [ { "id" : "container_XXX_000002", "timestamp" : "XXX", "action" : "MARK_BLACKLISTED" }, { "id" : "container_XXX_000003", "timestamp" : "XXX", "action" : "MARK_BLACKLISTED" }, ... ], "blacklistedNodes": [ { "id" : "node1", "timestamp" : "XXX", "action" : "MARK_BLACKLISTED" "taskManagers" : [“container_XXX_000004”, “container_XXX_000005”, …] }, ... ] } |
Add
POST: http://{jm_rest_address:port}/blacklist/add
Request:
Code Block | ||
---|---|---|
| ||
{ "newBlacklistedTaskManagers": [ { "id" : "container_XXX_000002", "action" : "MARK_BLACKLISTED" }, { "id" : "container_XXX_000003", "action" : "MARK_BLACKLISTED" }, ... ], "newBlacklistedNodes": [ { "id" : "node1", "action" : "MARK_BLACKLISTED" }, ... ] } |
...