Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Initial General Questions

  • How can we easily track the large epics of work that need to be accomplished?

    • How can we track which items are currently being worked on?

  • What are the current list of High Priority work items that are currently being addressed?

 

...

Areas/epics of work and Actions for each prioritized:

Short Term Epics

  • Epics/issues to make components better deployable / manageable on Kube.

 

"Dockerization" of OpenWhisk Components

  • Goal: Minimize reliance on Ansible (for building in configurations)

    • Goal would be to Dockerize as much as possible

  • Considerations:

    • Have Kube YAML files as the primary source of configuration

      • Utilize ENVIRONMENT vars

      • “bake” other configs into Docker images

      • Use shared storage via NFS mountsKubernetes volumes:

        • 3rd party volumes are managed through YAML files (all images mount existing NFS volume names)Use NFS as the “disk”, for updates that is easy as there is some NFS server that gets (un)mountede.g NFS volume mounts or any persistent infrastructure mount options. see here)

      • Determine if we need to do better health checking to determine if we should start. Do not fail to run a process

  • Action Items:

    • Nginx:

      • Build Nginx with all OpenWhisk specific requirements (wsk, blackbox) pre-built into the Docker image.‘nginx.conf’ file should be generated when the Docker image comes up. This image should accept environment variables to setup deployment specific properties (i.e. URL for OpenWhisk Controller DNS address).

      • helps generate certificates
      • create a Kube ConfigMap or Secrets Resource from those certs and a static Nginx.conf file. Where this nginx.conf file is specific to an environment
      • Have yaml file(s) for the Kube Deployment and Service which uses the generated ConfigMap
    • Controller:

      • Provide the ability for the controller to receive updates that new invoker instances are able to be used.

        • This could be having invoker instances communicate directly to the controller <or>The controller watches for updates to a key-value pairs it cares about in Consulalready happens by default. Currently Kafka can receive new topics to automatically be created and used.

      • Need to make sure we use stateful sets so controller has unique names.

    • Kafka:

      • On the initial startup, Kafka should register the “health” and “command” topics.

      • Ensure that Kafka is able to receive topic creation requests from Invoker instances

    • Zookeeper:

      • None?

    • Invoker:

      • Have the Invoker download all of the Have the Invoker register its Kafka topics by interacting with Kafka.

      • Have the Invoker register itself with the Controller:

        • The Invoker must register itself directly to the controller <or>

        • The Invoker registers all key-value pair information about itself into Consul

      • Have only one Invoker instance be deployed to a Kube node and ensure that no other Kube Pods run alongside it as well

    • Consul:

      • One of two options:

      • Have consul exist as the central point where components register their information as they come up. Then all other components can receive updates from consul <or>

      • Remove consul from part of the OpenWhisk Deployment.
    • CouchDB:

      • Goal: Come up with a standardized way to setup and configure CouchDB as OW’s default document store.

      • Considerations:

        • This component is somewhat unique on the OpenWhisk deployment strategy as it only has to be done once and does not have rolling updates

      • Questions:

        • How can I configure CouchDB to with seed information for OpenWhisk?

        • Can we better leverage public Docker image by wrapping it for our needs (config)?

        • How can I have the OpenWhisk components talk to CouchDB?

      • Assumptions:

        • Over time we are working towards a “pluggable” document store approach, but this is beyond short-term scope. Despite this approach we still need to “Dockerize” our init/config as the “default”.

      • Implementation:

        • Prebuilt couch DB then init script that edits the authentication for unique credentials. Also edit the entries within the database with unique credentials

CI and load testing of OpenWhisk on Kubernetes

  • Goal: standup testing resources at Apache and utilize for public CI and performance testing of OpenWhisk on Kubernetes
  • Needed resources:
    • A 5 worker node kubernetes cluster.  Each worker nodes can be fairly modest (2-4 virtual cores; 4-8GB of memory)
      • 2 nodes for control plane: controller, kafka, nginx
      • 1 node for couchdb
      • 2 nodes for invokers

Medium Term Epics

  • E.g., Work items / issues (by component) that improve component’s Clustering or HA enablement

  • E.g., Work items that allow for “pluggability”

 

Kube Deployment Variations

  • Goal:  Ensure that various Kubernetes configurations are supported.

...

  • Documentation

    • Make sure there are Kubernetes environment specific docs to help setup and use the following infrastructures:

      • Minikube

      • Kubeadm

      • Local-Up-Cluster

    • Ensure that all commands care copy pastable to get an initial OpenWhisk deployment onto Kubernetes

  • Troubleshooting

    • Document a common set of problems that can be seen from any Kubernetes environment and their solutions

  • Tests and Integration

    • CIs will need to be created for each Kubernetes environment so that we ensure multiple configuration and deployment strategies work

    • These Integration tests should catch regressions where OpenWhisk could require an update to Kubernetes infrastructure

      • E.g Invoker mounts hosts Docker socket and so there are minimum requirements or deployment configurations that must be met.

 

Kube Configuration Options

  • Goal: Make components more adaptable to to running on a cloud platform

...

  • Action Items:

    • Invoker

      • Enable the Invoker to use all available system resources

 

Long-Term Epics  

  • More generalized goals / more investigation needed (per-component and system-wide basis)

  • Investigations for making components HA (i.e Kafka)

...

HA Epic:

  • Goal: All “core” OpenWhisk components should have HA considerations that work on a Kube deployment.  

  • Considerations:

    • Fixed vs. Pluggable components:

      • Over-time, some components that are provided as “defaults” may very well be replaced with services that are already “HA-enabled”.

      • Providers may choose to use replacements based upon their available services or preferences.  Pluggability is key.

      • Some components that fall under this category include:

        • CouchDB (document store)

        • NGINX (edge server)

      • Understanding this affects what we prioritize (i.e., prioritize components that are viewed as a “fixed” part of the architecture vs. pluggable).

    • Scaling

      • Assumption: unless otherwise noted, HA (cluster) enabling components will result in scalable components.

      • Can my (Docker) component be scaled based upon resource usage (by Kube) memory/disk usage (not process/cpu)?

  • Questions (to ask of each component):

  • Clustering components

    • Can the component be started without any (Ansible-driven) configuration injection or dependencies other components (boot-order).

      • I.e., In many cases this is asking how can we remove any Ansible specific setup and include those configurations into the Docker image?

      • Or do components have to register themselves with whom they wish to interact with?

    • Intra-component Communication: Can there be multiple instances of that one component and have them communicate with each other

    • Inter-component Communication: How is DNS (routing, spraying) registered/handled?

    • Message Queueing: How are queue topics registered/handled (if they are used)?

  • Soft Kill (tested)

    • Rolling update (primary use case): Can the process handle rolling updates

      • Example: Updat Kubernetes (etcd) version update:

        • e.g., Kube 1.5 to 1.6 update

        • Matching CLI kubectl (should not matter)

      • Example: Hypervisor / VM host update

    • What happens when soft killing a process? Can it recover?

      • I.e. is able to recover any requests/messages (data “in motion”) upon restart

  • Hard Kill (tested)

    • What happens when hard killing a process? Can it recover?

      • I.e. is able to recover any requests/messages (data “in motion”) upon restart