Attendees: Daniel Lavine, Matt Rutkowski, Tyson Norris, Kavitha Devara, Jeremias Werner, Ben Browning
Notes:
- Daniel - current depl. is just a job that wraps up all Ansible scripts and deploys containers under kubernetes
- ideally, we would like to get to is having straight YAML files for all OW components, that can be deployed in a normal kube workflow
- 0:05: Short-term epic..
- get comps. “dockerized”
- remove need for Ansible; have people/containers more knowledgeable of configs they need
- crawl the comps.
- as many props. as can move into ENV vars.
- This approach should work for Compose, or other ways of spinning up containers (not Kube specific)
- “bake” scripts or startup actions they need to performs
- any persistence use Volumes
- 0:11 Nginx
- Can’t remove currently (because HA Controller design, cannot use pure DNS for routing, have to have Backup routes in Nginx, cannot do this in Kube?
- Try this DNS, then try next DNS, etc.
- For now keep NGINX around.. come back and revisit this
- Instead, I suggest we create a script that:
- helps generate certificates
- create a Kube ConfigMap from those certs and a static Nginx.conf file
- Have yaml file(s) for the Kube Deployment and Service which uses the generated ConfigMap
- BB: fine for first pass
- could do Ingress, selectors and lables to help control routing would have to dig into HA code to comment more
- DL: everything is IP based (due to Ansible) and indicates which IP address/routes
- traffic is directed to one or another
- DD: ingress with Nginx, unless we have a tech reason, we keep it, unless we have no load balancer in front and have Controller take in more resp.
- JW: had this in mind for scaling out controller, PR as of yeserday (hot standby controller) would like “N” controllers where Edge proxy is routing across all of them
- OW internally, we even have a router internally routing traffic betwene 2 differenet OW deployments, someone like to merge edge proxy with that router
- First have to figure out LB between the controllers.
- Creating the custom certificates needed for Nginx and I don't think this problem cannot be easily solved in some generic way.
- Config Map for Certs. and static Nginx config file
- BB: should use Kube “secrets” for some items that need to be sucure
- similar to Config Maps, which are namespaced, but more genereic
- DD: sync. with a shared folder (an S3 or something), we can simply change a file and it gets “picked up”.
- DL: Config Map “secrets” are like a file, dependent containers would “pick it up” over time
- 0:22 Controller
- BB: How do we get load information back to Controller
- Som PR may have landed… Controller now has more knowledge of ‘load” on invokers
- TN: does not have more knowldge, just smarter about tracking state of invokers
- focal point of what we are looking at for Mesos also
- in long run, if mesos, kube, swarm, etc. how do you share the state of the broader cluster with 1 or more Controllers?
- BB: for short term, we need not worry about this as a “first pass”
- TN: what is focus? getting deployed in Kube and change other things later on
- DL: not rely on Ansible; smarter on configurations
- TN: got it
- DL: need to revisit later on
- DL: how does it work now? have 5 invokers, need 6 now? how do i reconfig?
- BB: kafka allows auto create of topics, new invoker spun up, new topic created, somehow invoker tells controller it exists (I have seen this, but I am not using Consul either)
- Perhaps using health kafka topic?
- JW: new health protocol using an extra channel where Invoker send ping to Controller
- Configure topics during depl. time
- with Kube way to create topics automatically?
- BB: image of Kafka out of box it supports “auto creation of topics” (out of the box)
- BB: health and completed topics, I need to still create these, but it then takes lon g time otherwise Kafka complains about it
- TN: can't consume a topic that does not exist??
- JW: we configure during depl. time… to resolve startup issues
- BB: do not create invoker topic at all; kafka is trying to decide on cluster/backup for things that are not pre-created; there are ways to make this work.
- DL: Perhaps Controller does not need as much work as thought… if moving to get rid of Consuk
- JW: currently evaluating; Christian looking at Redis as replacement; heads up we are looking there now
- TN: looking at this for Mesos as well
- BB: Alarms package updated to use Redis recently as well
- JW: jason and christian (slack channel on Redis work might be good)
- TN: ping msg received (sent by invoker) it will register the invoker in the Controller.
- JW: other way around; when health ping is missing, either it considers Controller to be idle or broken
- TN: may be an issue there, if u reg. am invoker with a spec. name… if it dies and re-appears someplace else; when it re-appears it should have the SAME name or messages get lost
- DL: stateful sets in Kube, guarantees that if “invoker 1” goes down “invoker 1” gets restarted (same name)
- 0:33 Kafka
- “health and command” topics NEED to be created
- DL: could wrap Kafka image in an initialize SCRIPT, so that when it comes up (kafka) ping until ready, initialize those 2 topics, avoid using Ansible
- BB: I have some impls. that shows how I use kafka that does similar things
- BB: throw Kube a YAML file (with simple wrappers like that), can share them with you if you like
- DL: How do we get proper credentials
- BB: Apache Jenkins already publishes some official images
- DD: means to create
- BB: if you do not auto create topic ahead of time (Kafak will have lots of errors trying to start); could be replication factor (that ansible sets up); using 3 (repl factor) causes issues
- perhaps some kafka config can be set,
- current Kafka is NOT HA, so many repl factors/etc need to be man. set
- TN: kafka auto create topics only when producer pub. a mesg to to
- Consume topic that does NOT exist; is treated as a ailure
- DD: with Mesos, we use Kafka pkg. from community, but do not see this
- BB: see lots of errors, things eventually work, but takes a long time,
- TN: not a clean startup
- DL: ways to config. repl count? we are not able to set that property as kafka version is too old, would need to update version to better set that.
- TN: does kube not give you a way to do some pre/post setup?
- BB: sure; if we just want them just to be docker images. If you rely on Kube lifecycle hooks it does not work well in other environs
- DL: some of this is not guar.; as startup hooks run as soon as container created… could run before start command
- BB: has to be running to insert “seed” data (create topics you need)
- TN: need to maintain another docker image and not be a “public” consumer of the kafka.
- DL: should not be much to maintain
- BB: do not have to; could use public image and have config command (script); create a small docker file that creates your scripts (but it sill have to be published
- TN: script would wait for kafka to come alive then publish the data
- BB: could have 2nd container on kafka pod that waits for kafka to start and configures it
- JW: have you thought of controller or invoker to create topics?
- BB: only thing … the kafka topics scripts needs to be on the controller
- JW: no api for that?
- TN: does controller wait for kafka broker is running (harder to coordinate)
- BB: this is gen. problem as we “dockerize”, all Ansible assumes that sequencing of these components. I have had to “hack” my code to enforce these things
- TN: add logic to controller that waits for spec. services (kafka, at least one broker) to start; this might eliminate all these coord points
- JW: agree, esp. from an oper. perspective
- On initial startup need more robustness to consider all possible dependencies
- DL: implies better (system) “health checking”
- 0:45: Zookeeper
- 0:45: Invoker
- DL: sounds like already able to generate its own topics (and reg. itself to controller via topic creation)
- DL: having 1 invoker run per-kube node… setup YAML correctly with proper labels (where labels and nodes match together).
- 0:46 Consul
- DL: Ideally, should no be deploy
- DD: PR in progress to remove Consul…
- BB: I have not been using Consul at all/
- I need Env.. vars to say where Consul is, but after that, Controller does not seem to use.
- JW: How do you do this?
- DD: Env. vars
- JW: also some monitoring info there, could get rid of it soon.
- 0:48 COuchDB
- DL: no solution I like so far
- DL: kafka has to be up and running before you configure it
- have another pod to configure it
- DL: trying to get this dockerized, not sure of approach
- 1 shared scripts, given a URL and ENV vars
- wrap that script in a pod, run once and it goes away
- DD: started work on this (another guy on my team, talked to Carlos)
- Of cluster is running, there are concerns of loading an existing Database in HA
- DL: if we want CouchDB to be pluggable, not sure if we want snapshotted CouchDB image
- have catalog installed to it or no
- How are initial accounts setup?
- TN: for that to work, the image would have to copy its data to a mount point so that it could be restarted
- DD: image useful
- DD: once first function, you will want backup (and restore)
- DL: not sure if we want to have a real CouchDb on real Kube
- BB: can do persistence on Kube,
- TN: if kafka there, why would you not want DB there? what precludes you from doing backups?
- DL: not sure how you setup a clustered DB on Kube
- how does rolling update; does this work? will rest of system wait until it is done
- TN: Mesos has same concern… orch. updates in cluster
- DD: we have a python script to sync 2 DBs; it is not dist. it is a replicated one
- hoping we could have active/passive configuration (with LB in front)
- Still lose activations
- BB: in prod., we would not use Couch, we would CockRoach DB where this is already figured out
- TN: would all persist comps. / pieces be managed inside the Mesos cluster?
- BB: Red Hat has some storage things in this space (I do not manage it) 3rd party API I can talk to… some other team that manages HA datastores (services) outside of the project
- BB: for CouchDB here, i have an init script here can share, its a hack) installs ansible on the image, boots up couchdb on diff port. checks if brand new DB (or already setup) and restarts it on the “real port” and injects Auth credentials from env vars….
- BB: current scripts are specific to Red Hat openshift, but can make them vanilla for Kube
- DL: thanks
- DL: Do we want too build that into the controller? connect ; runs through BD migrations (and inits if it is not there)?
- TN: use system “flyway” ? in another project and it works really well
- BB: could work well with the auth and docs, but initial catalog, would work the same.
- flyway scripts would have contents of initial actions
- TN: may be more complicated in some ways… in flyway, can do Java based migrations (run some logic); if not run on part. DB it will run once and not run again.
- DL: as first pass, would want to do scripts?
- figure out what tools to use for these DB migrations
- DD: for now (Mesos speaking), fewer deps. on DBs.
- BB: leave it ip to provider of DB (image)
- DD: yes, use ansible or snapshot it (with scripts) and
- BB:prebuilt problem with auth (could simply inject auth parts) rest could be pre-built
- DD: have env vars control whisk.system and the guest (and actual creds. for DB)
- BB: both auth to DB, and entries in the DB (secrets would actually be the ideal way)
- DL: BB can u share scripts
- BB: sure, some are OpehShift specific, but will send you links