Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Creating a sequence of actions, or a composition, that processes the same asset having a size greater than the pax max payload limit makes it hard to benefit from the piping support in OpenWhisk which makes the output of one action in a sequence to become the input for the next action.  

...

The proposal is to transparently provide developers with a temporary way to store large assets in the OW cluster. This is probably the hardest problem to solve when compared to the other ones b/c it involves persistence, state, and possibly handling large amounts of data. Bellow are listed a few possible options:

...

Action volumes

Allow developers to "attach" a persistent disk to an action. The programming model in this case assumes there's always a folder available on the disk on a well known path defined by the developer. OpenWhisk's implementation could leverage solutions particular to a deployment:

Volumes could be local to a single host, or backed by a distributed FS, or blob storage.

  • Local disk 
    • PROs: The most performant option for a FaaS model because activations are usually short, volumes can be destroyed when a sequence finished, and network is not used.   
    • CONs: Limited to the max number of containers that can be executed in parallel on a host. 
  • Distributed FS
    • PROs: sequences can be long, actions can run on any host. 
    • CONs: dependent on the network speed, and network congestion. 

The WSK CLI could look like in the example bellow:

...

TODO: Describe congestion scenarios: (1) wait-time due to an a prewarmed action in the sequence processing another request on that node (2) lack of resources to cold start a new container on the same host. 

...

Provide developers with a caching solution in the cluster. Developers would still pass larger assets by reference between actions, they would still write code to upload or download an asset, but use a cache provided by OpenWhisk, inside the cluster. The cache can be exposed through the Openwhisk SDK through using 2 methods: write(asset_name, asset_value), read(asset_name).

The implementation could use a distributed in-memory cache, a distributed FS such as glusterFS, or a blob-storage, or even an EBS-like volume attached to 1 machine in the cluster to store the items in the cache

The problem with this approach is the network bottleneck; even if the action ends up coincidentally on the same host with other actions in the sequence, it would still consume network bandwidth to write or read an asset from the cluster cache, hence it's dependent on the speed of the network.

...