Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Gliffy Diagram
nameOpenWhisk-with-ClusterManager-simplified
pagePin56
 

Originally OpenWhisk was built with the assumption that each Invoker is responsible for a single VM in the cluster. With a Cluster Manager, this premise changes, as a single Invoker could be in charge of the entire cluster.  The Cluster Manager is responsible for each VM. From the Invoker's perspective, the entire cluster looks like a single pool of resources.

...

Gliffy Diagram
nameOpenWhisk-ManagementControlData-plane
pagePin36

Management Plane

The management layer exposes an API that is primarily serving developers that manage actions, triggers, rules, and APIs. The wsk  CLI interacts with this layer.

...

The Control Plane should provide an API used by the Data Plane to cold-start actions, and it should also emit events each time a change in the resource allocation happens; each time GC removes idle containers, or each time a new action is created, the Control Plane should notify all Data Plane instances of such changes. 

Info

To be detailed: Unlike the current OW, the system is "async by default". The new design is "sync by default". The open question is how to handle async cases.


Data Plane

The data plane layer invokes actions as fast as possible. When an action needs to be cold-started, the data plane delegates this to the Control Plane, awaiting for the action to become ready before invoking it. Once an action is warmed-up the data plane is notified, and if it was waiting for such event in order to invoke an activation, it should resume the execution. 

...

The Data Plane should stop sending traffic to actions that are marked for removal by the Control Plane. The only exception is when an action marked for removal receives an activation in the mean time, in which case the Data Plane informs the Control Plane, which  may choose to remove the "mark for removal" and keep the action running, or recycle the action with a new one.  

This layer should have support for sequences, and concurrency of 1 for action invocation.

The Data Plane should perform the Authentication and Authorization that OpenWhisk Controller does currently, and it should decorate the request with the context set by OpenWhisk (i.e. __OW_NAMESPACE, __OW_ACTION_NAME, __OW_ACTIVATION_ID, etc)

CNCF Projects to integrate with

Data Plane

Candidates: 

  • Envoy Image Added
  • Nginx

Requirements for Data Plane:

  • Invoke an authN/authZ service, reusing the existing authN/authZ implementation in OpenWhisk
  • Routing 
    • Including support for sequences
  • Throttling 
    • Respect namespace limits
    • Respect Action level concurrency 
  • Caching the response, based on what the action returns. I.e. an action that validates an OAuth token could instruct the system to cache the response for that token until it expires.
  • Support API Management
    • Otherwise the existing OpenWhisk Gateway can be reused
  • Support for Observability: Metrics, activation info, tracing, etc
Flow for warm container

Gliffy Diagram
nameOpenWhisk-activation-flow-warmed-action
pagePin3

  1. The request arrives from a client
  2. Authentication and

...

  1. Authorization
    1. The Container Router validates the Authorization header with OpenWhisk Auth Service
    2. The response of the Auth Service is cached 
  2. Routing
    1. Check namespace limits
    2. Forward the request to a container selected from a list of warmed actions that the Action Router keeps. 
      1. (new) Streaming the request to the action would be a nice; OpenWhisk doesn't have support for this, and such feature could remove the max payload limits
      2. (new) Websockets could also be supported, another missing feature in OpenWhisk.
  3. Container Proxy sidecar
    1. Check action concurrency limit
    2. Buffer a few more requests, queueing them into an overflow buffer; this may be something useful when cold-start could take longer than just queuing a few more requests. Blackbox actions that need to download the docker image may benefit from this more. This idea is inspired from KNative Serving
  4. Invoke the action and return the response
    1. (new) Caching the action response could be another nice to have feature, which is not implemented in OpenWhisk. Caching should be controller by the action response.
  5. Collect activation info.
  6. Sequence support.
    1. If the action is part of a sequence, then the Router should have logic to invoke the next action in the sequence.
      1. Other ideas to explore to support sequences, should the support in the ContainerRouter is too difficult to implement
        1. ContainerProxy could "understand" sequences
        2. Or reuse Composer and implement sequence-as-an-action.  
Flow for cold-start

When the Action Proxy is at capacity, it should return a 429 message back to the Container Router. A Retry-After  header could specify <delay-seconds>  or <http-date>  for a CircuitBreaker in the ContainerRouter to avoid routing to that action. The time window for retry should ideally be computed from the response times observed by the Container Proxy. 

Gliffy Diagram
nameOpenWhisk-ColdStart-ControlPlane
pagePin2


The green steps are additional steps required for cold-start:

4. Container Proxy returns a 429  indicating the action has reached its max concurrency and can't take more activations. If there's no container running for that action, skip to step 5.

5. Container Router goes to the DistributedContainerPool  to request a new container to be created

6. After the container is created, all Container Router instances are informed, and the activation proceeds as in the Flow for the warm container described above.

Control Plane

Candidates:

  • OpenWhisk Controller and Invoker - refactored into a single service that meets the requirements

Control Plane concerns:

  1. Cold-start actions - allocate resource
  2. Garbage Collect idle actions - de-allocate resources

The Control Plane should be used by the Data Plane only when cold-starting new actions.

DistributedContainerPool

This Component is at the core of the Control Plane. It should be concerned with the following:

  • globalPool
    • Cluster Wide view of all running actions
    • Distributed Map with minimum data about actions needed for ResourceAllocator and GC
    • it should sync with Kubernetes from time to time to update the state, in case a container dies, or a Kubernetes operation kills that container
  • resourceAllocator - SingletonActor
    • It’s in charge to start containers on a node that has resources
    • When allocating resources, Placement Strategies should consider CPU, MEM, GPU, Network, and other resources an action might consume.
  • garbageCollector - SingletonActor
    • it removes idle actions
    • It needs to be a singleton so that when deciding what resource to free, in can avoid fragmentation. In other words, it should free resources to make the free space as compact as possible.  
      • This is particularly important when scaling down the nodes running actions
    • Its implementation should be configurable and swappable 

Management Plane

This can reuse the OpenWhisk implementation.

Candidates:

  • OpenWhisk Controller, slimmed for Management APIs


Logs

Candidates:

  • Fluentd Image Added

Logs could be captured through fluentd and forwarded to ElasticSearch, Splunk, or other log stores (RDBMS, NoSQL, Hadoop)

The log format for the actions should be updated so that each log line includes activationID, namespace, and other identifiable information needed to serve wsk api logs <activationID>  command and wsk api get <activationID>  command. 

Monitoring

Candidates:

  • Prometheus Image Added


OpenWhisk already integrates with Prometheus. See https://github.com/adobe-apiplatform/openwhisk-user-events 

Throttling/Limits

Namespace limits

Limits such as max concurrent action containers per namespace could be enforced by the Container Router. 

Action limits

Limits such as concurrent requests per action container could be enforced by the Container Proxy.

CNCF Projects to integrate with

TBD


Previous discussions

Provide support for integration with Kubernetes. One approach could be to deploy and run the components on a Kubernetes provider as we do for Vagrant, Docker, Docker-Compose, and OpenStack.

...