Attendees: Matt, Christian (moderator), Dragos, Carlos, Daniel Krook, Dave Grove, Duy Nguyen, Markus, Martin, Vadim, Tyson, James Dubee, Ben Browning
Notes:
Introductions of attendees
  • No new faces
Open comments on status 
  • Main/core OpenWhisk
    • Markus: ContainerFactory SPI
    • Tyson: merged, submitting a PR for Mesos impl.; will be looking for feedback for Kube version of API
      • delegate container mgmt. to mesos/kube etc.
    • Christian: no major PRs
    • Christian: optimizations/perf
    • Markus: profiling controller in mult. envs. there is thread contention (lots) even things that surprised us.
    • several PRs from Christian and I to remove these contentions. e.g., the cache, UUID generation (on JVM) both were causing contention. no should be fixed/improved
    • Christian: are these all merged?
    • Markus: most are merged except caching change
    • Vadim: logging impact on throughput of system. we have been working on logging on controller because we find it takes lots of thruput (factor of 4); if you turn off logging (factor 4 improvement).
      • logging backing was log4j replaced with logback, now we have too many log entries, must reduce that; sent email to “dev” list to explain proposal.  How to send to statsd
    • Dragos: there is a PR that Sandeep is working on, for tracing support, through Zipkin.  Working to change libs to open tracing. Hope it is then intent to use this PR.
    • Vadim: have seen pR; we need to differentiate metric part form tracing part; have counter or # failures in the invoker for example
    • James: has findings on memory consumption.  Profiling and creating a datatype that does not contain a “code" property.  If you have a datastore that improvement for actions already cached seen.  Noticed when action is not cached we go from 50mb of heap to abput 500mb of used heap; trying to figure out where heap is going to.
      • Move to attachments, so we do not have to get the code overtime we retrieve the action; yet to get to that work.
      • working on code cleanup (to drop code property form datatype) to submit a PR and start doing the “attachment” work.
    • Christain: Akka clustering work ? Vadim/Tyson?
    • Vadim: introducing akka clustering to scale out controller instances; agreement to not make it default (to false) if you need it, you need to enable it.
      •  Dragos and Tyson made some tests on Mesos, need more work there
    • Ben: on akka clustering, we are using Redis to manage shared state, when do we use redis vs
    • Vadim: not sure we are using redis
    • Ben: there is PR outstanding, also use it for Alarms in HA
    • Markus: PR open is to have invoker IDs dynamically assigned, that is latency insensitive (can take 10-100msec); we are using diff. mech. tio manage state (your correct); now its a concern to not have to stand up a Redis 
    • Ben; form an ops. standpoint, if we look at a Redis cluster and an Akka cluster. These are 2 clusters that have overlapping functionality
    • Dragos: +1 on the ops. aspect, can we make this optional (if we do not care if controller does not start with same ID or not)
    • Markus: is impl. optional now?
    • Markus: if you hardcode the ID yourself, it may talk to Redis itself, 
    • Dave: in current PR it is optional, are we committed to Zookeeper or not..  
    • Ben: yeah, zookeeper is a 3rd way
    • Dave: on Kube side, we have daemon sets, could do this with zookeeper instead cause we do have kafka
    • Dragos: why does have same ID for controller help? failover/resume?
    • what is the impact if this feature is not included
    • Markus: sole reason today, ansible today works on sequence of controllers (step sizes between indices), load balanced uses this information, need unique integer ID increasing in step size of1 to get it to work (intended to work); in theory, u may not care in future, but today we do
    • Dragos: can we just use IP address, and give inex fo rthat IP (10.0.0.1 gets 1, 10.0.0.3 get 3); just want to removed dependencies
    • Ben: you may have described (Dragos) the PR how it works; it uses Redis because you may have more than 1 controller, load balancer needs to know this info bw/ controller restarts
    • Dragos: once Akka clustering works, could we move that instead?
    • Ben: that was my orig. question.. on Redis
    • Dave: Markus and I realized, since we externalized state, Akka cluster is not involved with controllers, we would have to have some way to message back to controller
    • Markus; not a design behind this; the PR opened by Dave Grove is simple enough not to be too harmful (its optional anyway can keep hardcoding) atomic/incremental counter is OK otherwise we would need another way that may be more complex.
      • Do not have enough experience with Akka clustering it yet to have it manage 200 machines for example; perhaps in future we could
    • Tyson: in the PR, is there an accommodation (if using dynamic ID gen.) if the ID controller gets changes; can invoker generate a new ID? how is that resolved
    • if inv. goes offline (orphaned) controller
    • controller restarts, it gets a new topic
    • Ben invoker has to maint. new UUID, or pass in the old ID; that integer (or stable UUID on invoker restart)
    • Christian: can we take this to the “dev” list? 
    • ALL; yes
    • Tyson: LogStore SPI, circulated idea on dev list, any questions?  Jeremias and I had small discussion on this.  Briefly, the email indi
      • 1. enable use of log drivers at the Docker uUn commns for 
      • 2. for collecting logs after an activation is processed
      • 3. retrieving logs for API calls for getting interesting details
    • Tyson, lots of combos. possible. whats impl. in PR is some simplistic log driver version, that says i am collecting logs externally and PAI
    • and splunk extension of it there is a retrieval mech. for getting logs from splunk
    • Tyson: some implication of how logs are emitted from containers (Action containers), for example if container log a stream of activations being processed and using Splunk or other logs store, you  will see a stream of these BUT will have no idea which log events correspond to which actions
    • if using an ext. log store, we are working on how to correlate.. any Qs? comments? 
    • Markus: we discussed, looks good… good idea to open up the log driver.  The only concern i had with design was log retrieval bit; reimpl. what Splunk has (whatever the backend might have to give you)
    • Markus: use activation API as of today on OW, the interfaces for Splunk tries to get them out of splunk in same format as today
    • Tyson: gets them out of splunk using the splunk api… its more of a wrapping but not a reimplementation
    • Markus: we would have to wrap eery API that might be out there…
    • Tyson: its a Q of whether you intend to prolong the functionality of CLI/API, if you do not care then you do not have to do anything
    • Markus: we should make this part of the discussion; if you put it in now, in future we would have to explain to user 
    • Markus: either logs externalized and OW just does not care
    • Tyson: using CLI you cannot fetch the logs…
    • Carlos: the user experience… we must have to work to satisfy that need.  Perhaps provide you URL for other API to manage the logs externally…
    • Carlos: log are good for CLI (for debugging) as its all we have to debug; want to retrieve logs based upon time window… quite common… need to have that discussion
      • User may benefit more form going to ext. system than get what they have to day
    • Tyson: PR allows message to put in there (for user)
      • we have a Q on this, if you have a UI like splunk… do you really want to expose that API directly to end user?  Can they cope with that Splunk UI?  What is user experience/expectations (maybe too much)
    • Carlos: operator can set/flow ext. logging reference… we would give them some IBM front end (not the actual log util directly)
    • Vadim: 2 diff user exp…. if you run a system for 1 year (provider) and the other developer (quick feedback, poll for activations as an example)
    • Markus: Tyson, is your use case on yours die with splunk, is it just storage? or allow users a search engine (more complex)
    • Tyson: initially just to store logs… eventually an awesome experience, but do not want ot rebuild Splunk UI… but some ability to drill down into activations with simple filtering
    • Markus: diff use case for us… we want to surface the front end API to the backend system… do not bother with the simplistic things we now have in our current bmix UI.
    • Carlos: maybe way to store/cache enough for the 2nd use case (immediate history), but offload longterm 
    • Ben: can we just use Kafka to get some of the history?
    • Carlos: they dot flow to Kafka today…
    • Tyson: once log drivers are used, no OW components are involved.  The Q is do you want to expose logs to CLI… if YES, they storing/caching locally will be a lot more work than writing a few APIs and hitting a real external logstore (i.e., you would duplicate log storage system functionality)
    • Christian: we have the most important points.. move to next topic
    • Carlos: runtimes (moving them into their own repos.) shared slide (see video)
      • Runtimes & Kinds (slide title
        • refactoring (Travis,etc) to allow for this work
        • Runtime is where action runs
        • Kind in the system (CLI) its a metadata field that assoc. action with runtime and version
      • proposal is… we try our best (knowing we have mult versions for some lang), we have an upstream (Git) for each major runtime (nodejs6, nodejs8, etc.)
      • Decouple Runtime (build/version) from Controller (version) as they are not linked
      • Goal: allow runtimes to be kept more easily up-to-date (security, library updates) and mnot be tied to controller/main OW releases.
    • Ben will you change name of image for downstream?
    • Carlos: Downstream repos… we would offer derivative runtimes e.g., nodejs-ibm-8 that builds on the upstream images.  Use “tags” to tag the image (but not change name).
      • that is, OW ones would be “base” image and operators would extend (shows NodeJS8 example with “base” vs. “ibm” library set proposal).  For example, the IBM image would have DB2 (DashDB) and Watson libraries...
    • James: would have a NodeJS repo and sep. IBM repo?
    • Carlos: would do it with images… in our case we would have a repo to produce OW image and an IBM repo. to produce our image
    • Dragos: it would be good idea to allow community ti weigh in on what modules to include i NodeJS
    • Tyson: yes, agree.. the OW community image should have more than just minimal… ti should have enough to make it usable
    • Carlos: wanted to start simple… see what people want to add (start clean at first)… many mods. some are defacto standard, but others are marginal
    • Ben: in my case at RedHat, we would build each Docker image form source, but not use the ones OW provides (as base), we would build our own runtimes
    • Carlos: doing same thing for IBM, looking at how to make proxy code more common now… 
    • Carlos: this is a proposal, looking for feedback.  Want to setup basic rules/guidelines as well on how we maintain/update language (major / minor versions) use common sense (as well as libs.)
    • Carlos: will post slides to mailing list for review. working on github and trying to keep history from old repo as well.
    • Carlos: new topic… release process
      • Release defns. - from ASF (needs to meet their legal reqs.) and go thru their process
      • Source release is min. consideration for a release (for graduation)
      • "Convenient binaries".. have a process, but 2ndary to source release process
    • Carlos: the community would want convenience binaries (Docker images, JAR files), but that has to happen after a release already voted/posted at Apache (from source code) BEFORE any binaries are derived from it.
    • Carlos: process back and forth to Apache Incubator… using mailings lists with Votes (summary slide shown)
    • Carlos:
      • a set of OW components
      • each component (associated with a Git repo.
      • source included
      • release notes
      • manifest
    • Carlos: shows list of proposed components (repos) that would be included in a 1.0 release of OW (Tag/hash would be recorded)
    • Matt: will we have (like other ASF projects) a repo. to help manage/automate process?
    • Carlos: yes… share the pain of being a “release manager”; Apache Cordova 
      • signing, git cloning etc.
Solicit topics for next call from attendees
  • Docker for Mac demo (Carlos) postponed
Confirm moderator for next call (i.e., Wed. August 2nd)
  • Tyson volunteers.  
  • Matt: Serverlessconf NYC will conflict… Markus, Dragos, Matt (others?) will not attend next call so expect lower attendance.
  • No labels