2017-06-23 OpenWhisk on Kubernetes Zoom chat

Attendees: Daniel Lavine, Matt Rutkowski, Tyson Norris, Kavitha Devara, Jeremias Werner, Ben Browning

Notes:

Daniel - current depl. is just a job that wraps up all Ansible scripts and deploys containers under kubernetes

ideally, we would like to get to is having straight YAML files for all OW components, that can be deployed in a normal kube workflow
0:05: Short-term epic..

get comps. “dockerized”
remove need for Ansible; have people/containers more knowledgeable of configs they need
crawl the comps.
as many props. as can move into ENV vars.

This approach should work for Compose, or other ways of spinning up containers (not Kube specific)

“bake” scripts or startup actions they need to performs
any persistence use Volumes

0:11 Nginx

Can’t remove currently (because HA Controller design, cannot use pure DNS for routing, have to have Backup routes in Nginx, cannot do this in Kube?

Try this DNS, then try next DNS, etc.
For now keep NGINX around.. come back and revisit this
Instead, I suggest we create a script that:
helps generate certificates
create a Kube ConfigMap from those certs and a static Nginx.conf file
Have yaml file(s) for the Kube Deployment and Service which uses the generated ConfigMap
BB: fine for first pass

could do Ingress, selectors and lables to help control routing would have to dig into HA code to comment more

DL: everything is IP based (due to Ansible) and indicates which IP address/routes

traffic is directed to one or another

DD: ingress with Nginx, unless we have a tech reason, we keep it, unless we have no load balancer in front and have Controller take in more resp.
JW: had this in mind for scaling out controller, PR as of yeserday (hot standby controller) would like “N” controllers where Edge proxy is routing across all of them

OW internally, we even have a router internally routing traffic betwene 2 differenet OW deployments, someone like to merge edge proxy with that router
First have to figure out LB between the controllers.

Creating the custom certificates needed for Nginx and I don't think this problem cannot be easily solved in some generic way.

Config Map for Certs. and static Nginx config file
BB: should use Kube “secrets” for some items that need to be sucure

similar to Config Maps, which are namespaced, but more genereic

DD: sync. with a shared folder (an S3 or something), we can simply change a file and it gets “picked up”.
DL: Config Map “secrets” are like a file, dependent containers would “pick it up” over time

so similar behavior

0:22 Controller

BB: How do we get load information back to Controller

Som PR may have landed… Controller now has more knowledge of ‘load” on invokers

TN: does not have more knowldge, just smarter about tracking state of invokers

focal point of what we are looking at for Mesos also
in long run, if mesos, kube, swarm, etc. how do you share the state of the broader cluster with 1 or more Controllers?

BB: for short term, we need not worry about this as a “first pass”
TN: what is focus? getting deployed in Kube and change other things later on
DL: not rely on Ansible; smarter on configurations
TN: got it
DL: need to revisit later on
DL: how does it work now? have 5 invokers, need 6 now? how do i reconfig?
BB: kafka allows auto create of topics, new invoker spun up, new topic created, somehow invoker tells controller it exists (I have seen this, but I am not using Consul either)

Perhaps using health kafka topic?

JW: new health protocol using an extra channel where Invoker send ping to Controller

Configure topics during depl. time
with Kube way to create topics automatically?

BB: image of Kafka out of box it supports “auto creation of topics” (out of the box)
BB: health and completed topics, I need to still create these, but it then takes lon g time otherwise Kafka complains about it
TN: can't consume a topic that does not exist??
JW: we configure during depl. time… to resolve startup issues
BB: do not create invoker topic at all; kafka is trying to decide on cluster/backup for things that are not pre-created; there are ways to make this work.
DL: Perhaps Controller does not need as much work as thought… if moving to get rid of Consuk
JW: currently evaluating; Christian looking at Redis as replacement; heads up we are looking there now
TN: looking at this for Mesos as well
BB: Alarms package updated to use Redis recently as well
JW: jason and christian (slack channel on Redis work might be good)
TN: ping msg received (sent by invoker) it will register the invoker in the Controller.
JW: other way around; when health ping is missing, either it considers Controller to be idle or broken
TN: may be an issue there, if u reg. am invoker with a spec. name… if it dies and re-appears someplace else; when it re-appears it should have the SAME name or messages get lost
DL: stateful sets in Kube, guarantees that if “invoker 1” goes down “invoker 1” gets restarted (same name)

0:33 Kafka

“health and command” topics NEED to be created
DL: could wrap Kafka image in an initialize SCRIPT, so that when it comes up (kafka) ping until ready, initialize those 2 topics, avoid using Ansible
BB: I have some impls. that shows how I use kafka that does similar things
BB: throw Kube a YAML file (with simple wrappers like that), can share them with you if you like
DL: How do we get proper credentials
BB: Apache Jenkins already publishes some official images
DD: means to create
BB: if you do not auto create topic ahead of time (Kafak will have lots of errors trying to start); could be replication factor (that ansible sets up); using 3 (repl factor) causes issues

perhaps some kafka config can be set,
current Kafka is NOT HA, so many repl factors/etc need to be man. set

TN: kafka auto create topics only when producer pub. a mesg to to
Consume topic that does NOT exist; is treated as a ailure
DD: with Mesos, we use Kafka pkg. from community, but do not see this
BB: see lots of errors, things eventually work, but takes a long time,
TN: not a clean startup
DL: ways to config. repl count? we are not able to set that property as kafka version is too old, would need to update version to better set that.
TN: does kube not give you a way to do some pre/post setup?
BB: sure; if we just want them just to be docker images. If you rely on Kube lifecycle hooks it does not work well in other environs
DL: some of this is not guar.; as startup hooks run as soon as container created… could run before start command
BB: has to be running to insert “seed” data (create topics you need)
TN: need to maintain another docker image and not be a “public” consumer of the kafka.
DL: should not be much to maintain
BB: do not have to; could use public image and have config command (script); create a small docker file that creates your scripts (but it sill have to be published
TN: script would wait for kafka to come alive then publish the data
BB: could have 2nd container on kafka pod that waits for kafka to start and configures it
JW: have you thought of controller or invoker to create topics?
BB: only thing … the kafka topics scripts needs to be on the controller
JW: no api for that?
TN: does controller wait for kafka broker is running (harder to coordinate)
BB: this is gen. problem as we “dockerize”, all Ansible assumes that sequencing of these components. I have had to “hack” my code to enforce these things

These “wait” checks

TN: add logic to controller that waits for spec. services (kafka, at least one broker) to start; this might eliminate all these coord points
JW: agree, esp. from an oper. perspective

On initial startup need more robustness to consider all possible dependencies

DL: implies better (system) “health checking”

0:45: Zookeeper

DL: fine for now

0:45: Invoker

DL: sounds like already able to generate its own topics (and reg. itself to controller via topic creation)
DL: having 1 invoker run per-kube node… setup YAML correctly with proper labels (where labels and nodes match together).

0:46 Consul

DL: Ideally, should no be deploy
DD: PR in progress to remove Consul…
BB: I have not been using Consul at all/

I need Env.. vars to say where Consul is, but after that, Controller does not seem to use.

JW: How do you do this?
DD: Env. vars
JW: also some monitoring info there, could get rid of it soon.

0:48 COuchDB

DL: no solution I like so far
DL: kafka has to be up and running before you configure it

have another pod to configure it

DL: trying to get this dockerized, not sure of approach

1 shared scripts, given a URL and ENV vars
wrap that script in a pod, run once and it goes away

DD: started work on this (another guy on my team, talked to Carlos)

Of cluster is running, there are concerns of loading an existing Database in HA

DL: if we want CouchDB to be pluggable, not sure if we want snapshotted CouchDB image

have catalog installed to it or no
How are initial accounts setup?

TN: for that to work, the image would have to copy its data to a mount point so that it could be restarted
DD: image useful
DD: once first function, you will want backup (and restore)
DL: not sure if we want to have a real CouchDb on real Kube
BB: can do persistence on Kube,
TN: if kafka there, why would you not want DB there? what precludes you from doing backups?
DL: not sure how you setup a clustered DB on Kube

how does rolling update; does this work? will rest of system wait until it is done

TN: Mesos has same concern… orch. updates in cluster
DD: we have a python script to sync 2 DBs; it is not dist. it is a replicated one

hoping we could have active/passive configuration (with LB in front)
Still lose activations

BB: in prod., we would not use Couch, we would CockRoach DB where this is already figured out
TN: would all persist comps. / pieces be managed inside the Mesos cluster?
BB: Red Hat has some storage things in this space (I do not manage it) 3rd party API I can talk to… some other team that manages HA datastores (services) outside of the project
BB: for CouchDB here, i have an init script here can share, its a hack) installs ansible on the image, boots up couchdb on diff port. checks if brand new DB (or already setup) and restarts it on the “real port” and injects Auth credentials from env vars….
BB: current scripts are specific to Red Hat openshift, but can make them vanilla for Kube
DL: thanks
DL: Do we want too build that into the controller? connect ; runs through BD migrations (and inits if it is not there)?
TN: use system “flyway” ? in another project and it works really well
BB: could work well with the auth and docs, but initial catalog, would work the same.

flyway scripts would have contents of initial actions

TN: may be more complicated in some ways… in flyway, can do Java based migrations (run some logic); if not run on part. DB it will run once and not run again.
DL: as first pass, would want to do scripts?

figure out what tools to use for these DB migrations

DD: for now (Mesos speaking), fewer deps. on DBs.
BB: leave it ip to provider of DB (image)
DD: yes, use ansible or snapshot it (with scripts) and
BB:prebuilt problem with auth (could simply inject auth parts) rest could be pre-built
DD: have env vars control whisk.system and the guest (and actual creds. for DB)
BB: both auth to DB, and entries in the DB (secrets would actually be the ideal way)
DL: BB can u share scripts
BB: sure, some are OpehShift specific, but will send you links