Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

   2- Expose a REST endpoint on each application instance so that I can query each instance’s local state storage values (through Interactive Queries) and aggregate the results externally (using a database or something else).

    3- Create a scheduled Punctuate at the end of the Graph so that we can query (using getAllMetadata) all other instances's locally aggregated values and them aggregate them all and publish to another Kafka Topic from time to time.
          - For this to work we created a way so that only one application instance's Punctuate algorithm would perform the calculations (something like a master election through instance ids and metadata) and we had to create a REST proxy on each instance. We have to implement a bunch of things for each transversally aggregated view we need. In a web analytics case, for example, this might span dozens of views (and a lot of code) in a userid partitioned stream: global views per page, views per browser/device, average time in page, average user age, max time in page and so on...
                  
- It would be great if we could solve this problem without the need to add complexity to the system by adding an external database or tool. I would have to publish the aggregated values back to a Kafka topic if I have another Graph that needs that data too (what is common). Having KStreams to INPUT/OUTPUT its state through Kafka Topics is great because of the ecosystem that is already built.