You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Attendees: Matt Rutkowski, Jeremias Werner, Markus Thoemmes, Dragos Dascalita, Tyson Norris. Michael Marth, Michael Behrendt, Rodric Rabbah
Notes:
  • Tyson: topic: Put a Wiki page on CWIKI for describing both background on use case we are trying to support and approach.

  • Jeremias: just share general ida and the core of proposal
  • Tyson: ext. mech. for API-drive apps.
    • some simple examples on WIki, it becomes more appealing for devs. to use an intermediary system to provide a wrapper around APIs (vs. changing client or server)
    • Many of the case the APIs are being used in a browser; interested in throughput changes that happen when OW comes under load with large # concurrent users
    • The diagram captures the flow as of today
    • Controller/invokers rout actions to avail containers based upon avail., never mult. actions running in same container
    • # of containers in system affects concurrency; its ok for the most part, but also affected by # of actions in system at a given time
    • # of containers starts to affect current users with latency going up dramatically
    • Here we try to describe
  • Rodric: can u give an idea of what scale of users?  1000s? 100k’s?
  • Tyson: low end 1000s, high end 10,000s of users
  • Rodric: 30, 000 req. per sec?
  • Tyson: # re. per second depends on what they are doing…

    • there is not a fixed target. Our intention is to service APIs as if they are being routed directly, but instead service them via OW without impact on user
    • User API may have APIs that take 10 secs to respond, others are faster
    • latency incurred from routing through OW should not be noticeable
  • Rodric: make sense. what is scale of deployment? 100 VMs, 50 VMs?
  • Tyson: we do not have anything in production yet.  Scale is currently “up in ir” not yet ready for production, because we cannot yet run small loads. Have ways to go before we goto production server allocation.

    • Goal to increase density BEFORE we plan for prod.
  • Jeremias: Increase of latency is this due to queueing in system? (in your tests)
  • Tyson: yes. if for example, you run (have you seem Markus perf. repo. with his throughput tests?) using those tests it is easy to see the effects of the queueing when # conc. users goes over # of avail. containers (for that action) in the systems. The latency increase dramatically
  • Markus: that would be expected. you can even see this with moderate load due to queueing (to handle conc. load)
  • Jeremias: with no overload the latency using Kafka is not that high. It is not the Kafka service it is simply the queueing waiting for container.
  • Tyson. Agree. Latency looks good when Kafka. Again #users > #containers avail.
  • Typson: in general, user activity that exceeds what we would want to support > 10,000 users in the system. it becomes cost prohibitive to spin up 10,000 containers in the system.

    • There is a reason (cost) that we want to avoid 1 user per container model to keep cost down
  • MB: is it fair to say the “core” of problem you want to go from single thread processes to multi-threaded processes?
  • Tyson: core of problem is throughput, our proposal would allow (as one option) is having concurrent processing as you have described.
  • Dragos: 18 users, 16 gig of ram on a server… question in OW how close do we want to get to the point where we can allow a single VM to take more users.

    • if we give 1 CPU to 1 container… density is 8, if we allow.8 CPU then we get higher density
  • Rodric: not enough to talk about that. you have CPU share options; spin up container MB to container with it shared (part of Serverless contract). proposal changes CPU “share” model

    • If you allow 1/16 a slice of a CPU, you could spin up a 1000 containers on that VM
    • would like to understand standing up 10,000 containers is a problem.  It is doable.
    • What is CPU share you want to get to?
    • Intra/Inter container is an impl. detail
    • what share does each activation take of CPU?
  • Dragos: ? (unclear of comment here) 
  • Markus: limiting this via CPU share is not viable. Proposal on table should be effective under load.
  • Tyson: yes
  • Rodric: that goes to the resource model. u are paying per GB second, if you change that equation … what does it mean to be charged this way? What is the limit mean for memory/time?  It comes back to what amount of resources you plan to allocate to any given action/activation
  • Markus: makes sense
  • Tyson: what you are describing is a billing issue? not a resource issue?
  • Rodric: still exploring boundaries, fund. you bounded by container density, can reduce that in invoker activations based upon CPU share. some things are CPU bound some are memory bound. should not conflate the 2.  You are slicing the resource. Some things need more compute some more memory.
  • Dragos:makes sense, but what is not clear the Docker sharing  model of CPU (even if u allocate 1/16 of cpu) you are not guaranteed that resource, its unclear how Docker handles this.
  • Rodric: true. that is why memory is the limiting factor. how you impl. this increasing compute density, will affect view of how you look at resource sharing.  Trying to elucidate what we are talking about
  • MB: have u thought about expose to end user. if you specify memory as resource constraint?
  • Tyson: my view is its the same. meaning… when you pick a # for ur action today (and suggest amount of CPU or RAM per container); whatever rules you are using still apply for single activation or not.

    • The methodology u take must change based upon # of conc. users in system. you are always just “picking a number”
  • MB: 256 MB of memory today..
  • Tyson: how did we come to 256? why not 28 MB?
  • MB: I picked because it felt correct.  I assume when I run action that I have this amount. in a more dense model, if I pick less, what is the user guarantee
  • Tyson: you will not get the guarantee in the same way of course. What I am saying if there is a reason to pick this #, then the issue is problematic for you because you have a particular workload in mind (that may not suit the use case of many conc. users)
  • Rodric: # of conc. users should not be an issue. Still comes to CPU share, orthogonal to # of users
  • Tyson: may or may not be
  • Dragos: kind of action you are writing affects this (based upon what resource it uses).
  • Tyson: yes. if just making network calls (no memory) no reason we need to allocate
  • Tyson: of course if CPU intensive (or memory intensive) u of course need more resources for thes things
  • Markus: if NodeJS and other # of resource per activation
  • Rodric: what are we guar. the suers in term of resources?
  • Markus: we have a finite pool of resources that 
  • Dragos: only thing we guar. is memory, rest is shared.
  • MB: assume we allow multi-threading (for an thought exercise).

    • amount of resources per-action cannot be arbitrary. 
    • Do you have 
  • Tyson: not following question. It is possible to say that when we enable conc. actions to same container to limit that concurrency (by my tests) this action can only support 1000 conc. users with mem. footprint of 128 megs therefore I cap. my conc. users to some # (2000 users) and assure load across containers matches need of actions
  • MB: would have to think this through. For us, as an org. that has a commercial offering, is the billing model.

    • there is some capacity we give users (with a contract) that we give users.
  • Tyson: The time from 1st to last request, that time span for billing could be used instead of single activation charging
  • Rodric: the billing model could be a measure of GB seconds, but an issue that arises, in the container that hosts that action only lives for the action, then is reclaimed

    • for conc. reqs. does that lifespan/lifecycle mgmt. change?  It may end up being another “container pool” as Steve suggested (with diff. lifetime, etc.)
  • Tyson: that is an interesting option, but not a pre-req for doing tings with conc. activations.

    • The notion of the container becoming inactive for some time, then GC'ed, holds true. What changes is the # of request being services by the container (and still affected by how long last request left).
  • Rodric: measured when last request “left” the container.  Still further slicing granularity (splicing resource)

    • Inter-concurrency model / scheduling decisions become part of a diff. lifecycle mgmt. model
  • Tyson: YES, absolutely there has to be a sep. pool of containers to accommodate this new paradigm.

    • aside from logic when a container becomes eligible for cleanup, everything else can behave the SAME way.
    • a container is launched if there is a request, and cleaned up 
    • same for single or conc. activation model. The thresholds for calc. container inactivity would be very different.
  • MB: perhaps should augment proposal with this information, that is what criteria system has for pasuign, deleting (GC'ed), etc.
  • Tyson: sure. A working prototype would be useful, but had not gotten there yet
  • Rodric: was an early PR?
  • Tyson: yes, it roughly showed this, but other issues seen. Earlier proposed integrating at “invoker level” which turns out to be problematic. Really this would have to be integrated at the Load balancer level. That fits into notion of a new 
  • Rodric: at LB level, treat these actions as new “kinds” and need upon this type use different type of (pool) or some key assoc. with action to determine pool allocation as per Steve’s examples.
  • Rodric: since system mons. each action, it could become more intelligent on how it handles these diff kinds of actions
  • Tyson: yes, one way to look at is using a diff. action container (like NodeJS), which enforces this single action proc. workflow.  There would have to be another container that enables that (or some config that can turn on off conc. activations)
  • Tyson: do not see that behavior at first glance in any other container?
  • Markus: it is historical.  Early prototypes, diff. layers tried to prevent “bad” stuff from happening, could likely safely remove those “gates”
  • Rodric: NodeJS just happened to be first runtime we built.  Invoker enforces this invariance (activations) but could look at removing at container level.

    • Rodric could complex reasoning about applications, but Markus point is true it was historical
  • Markus: this type of action invocation, needs 100% reuse? At LB level, we need to impl. there as well as split diff workloads, but it is spec. to concurrency?
  • Tyson” repeat?
  • Markus: you were thinking of hooking in at the invoker leve (orig.) but u realized u need to move to an LB?
  • Tyson: yes. what Dragos and i discussed, there is queueing enforced at various levels, bw. controller and activation is received at container. Kafka queueing… may or not be a problem, but ultimately if talking about conc. processing, Kafka could become a bottleneck. At some point iw will become a problem

    • Once activation reaches invoker, ther eis a sinel container pool (actor). For container pool, ther eis a single message queue that is another plae whe
    • Container proxy is a singleton actor, anotehr place of probelms
    • Issues are complex, in addition to the LB level. 
    • Starting point is Load Balancer, but these other places we would need to look at as well
  • Markus: each action is handled in fraction of an msec.
  • Tyson: in best case milliseconds. in worst case we have exponential growth once container pool gets used by # of conc. users
  • Markus: that is buffering in Kafka. today invoker takes X number of activations from bus.  If we were to handle concurrent like 1000, we would take 100+ CPU share from the bus… what I'm saying, the bottlenecks with singleton container is most pressing one (and can look at)

    • removing Kafka is more problematic. should do this at an invoker level to start
  • Tyson: container pool and proxy are still singletons
  • Markus: in ur proposal (only enforcement is at container proxy)
  • Jeremias: In queueing scenario, if queueing starts (today) how would LB behave?
  • Tyson: when container resources are exh. then nothing can be done.  BUT point of exhaustion changes i u allow conc. requests.
  • Jeremias: not what i mean.  New model proposed would still use Kafka… once passes queueing and able to use new pool then we can look at other issues?
  • Tyson: not quite following.  If containers are maxing out, of course there will be queueing. what changing is # of conc. requests is > than 1

    • so threshold at which conc. users experience latency increases
  • Jeremias: ok, i follow. First enforce problems on invoker/container level (before exploring other issues) as Steve said using pools, but not attempt (now) to change the overall flow (using Kafka etc.). If containers have a density model implemented then we could see similar behaviors regardless of pool type.
  • Tyson: changing the behavior of activation when it enters system

    • in UI app. which is delegating to API calls, it is useless to process a response “later” because someone is listening (queueing is useless)
    • exiting impl supposes that if there is a timeout that the client will want the response at some point in the future, which is NOT true
    • proposal needs diff. response handling when coming rom browser. A 200 will not be helpful unless client knows it is using OW and starts polling for a response and has already exceeded a latency expected for an acceptable user response.
    • Propose we have a resp. that returns on a timeout period or returns an error, but assumes NO FURTHER processing (queued)
  • Rodric: sees benefits of cancelling blocking requests (mark as cancelable)
  • MB: yes, could we leverage the timeout parm we have today?
  • Rodric: it is conflated, based upon awareness of client (depending in client able to poll)

    • should sep. this topic in a new thread of thought
  • Jeremias: can we solve problem without changing the request path we have today.
  • Rodric: one thought, if you consider using OW to spin up container hat is long running and handle all requests to that one container (and OW just spins up/down) then OW may be fronting a PaaS

    • If we start changing these things does OW still look like Serverless (and not some PaaS)
  • MB: in my mind, what is crit. is min. the “spin up time” (also when u scale out). Looking at Serverless OW is most advanced.  Out sourcing to Kubernetes
  • Rodric: Fission has pre-warm pods (of containers0 where they hide latency of spin up. Latency is issue here
  • Markus: are we “off track” now?

    • Jeremias summed it up well.  We discussed conc. thing.  will have logging billing and other issues. What Tyson said we need proto. that actually works without touching the load balancer and implementing Invokers and containers to test what we can and cannot do?
  • Tyson: I’ve done some experiments. Doing ti at the invoker is problematic for reasons I described. That work not in a branch that could exhibit the issues
  • Markus: its a pref. issue… Kafka, invoker.  Should be impl. just in invoker?
  • Tyson: disagreeing with, but not sure how to prove it to you
  • Markus: discuss more on Slack?
  • Jeremias: discuss and think about and have a discussion on this point. to find out details to test feasibility (at inv. level) and have a call if we reach that point. Reduce the moving parts.of system today as much as possible.
  • Tyson: interested in using Mesos to launch containers.  Either invoker can delegate container launching to Mesos. or Mesos would plug in at LB level…

    • Fully appreciate it is a sep. discussion s well
  • Markus: all you said about LB, delegate to Mesos, or skipping “pool parts” and going right to containers could be beneficial (remove scheduling in multiple places) in invoker and other places

    • Fund. the issue is clearer than just speaking about conc. activations
  • Dragos: ?
  • Markus: should be able to go a long way without changing whole system,
  • Dragos: first step invoker? can we try these other paths (agreement)?  Is is worth having a separate conversation for HTTP?
  • Markus: say its a sep. discussion
  • Rodric: need to break down into smaller issues we can work on.
  • Tyson: Next steps?
  • Jeremias: have a few minutes
  • Tyson: will add more detail to proposal on container lifecycle

    • using diff kind of container or actions that support concurrency (what to do)? jsut mention?
    • As Rodric said, describe cancellation option (response handling) so action developer or client could describe their activations cancelable.
    • in terms of changing invoker apart from LB can u provide feedback to see if that is possible
  • Jeremias: also look at Steve’s proposal
  • Tyson: not sure how to start unraveling
  • Jeremias: do a big bang, or stepwise?
  • Tyson: Looking at SPI  support

    • would like to have a plug in approach first to help with some of these things
    • On Monday want to address current comments (apart from Markus) and get that PR accepted.
    • want a mech. to retrofit things more pluggable and allow switching out impls of parts of system
    • should help us test some of these exotic
  • Rodric: small drips, not large changes please

    • See us able, if comments addressed
  • Jeremias/: happy to try out some of this as well personally (after talking to Markus).
  • Adjourn: 11:08am US CDT




  • No labels