Nexmark is a suite of queries (pipelines) used to measure performance and non-regression in Beam (see https://beam.apache.org/documentation/sdks/java/nexmark/).
Links to the original Nexmark papers:
Some queries can be very complex. To ease their maintenance, here is a presentation of the architecture along with pseudo-code of the queries:
Components of NexMark
Generator:
generation of timestamped events (bids, persons, auctions) correlated between each other
NexmarkLauncher:
creates sources that use the generator
queries pipelines launching, monitoring
Output metrics:
Each query includes ParDos to update metrics
execution time, processing event rate, number of results, but also invalid auctions/bids, …
Modes:
Batch mode: test data is finite and uses a BoundedSource
Streaming mode: test data is finite but uses an UnboundedSource to trigger streaming mode in runners
...
Query 1: What are the bid values in Euro's? (Currency Conversion)
Simple map
Filter + ParDo to extract bids out of events
ParDo that outputs Bid objects with price converted
Expand | ||
---|---|---|
| ||
SELECT Istream(auction, DOLTOEUR(price), bidder, datetime) FROM bid [ROWS UNBOUNDED]; |
Query 2: Find bids with specific auction ids and show their bid price.
Illustrates simple filter
Filter + ParDo to extractbids out of events
Filter to keepbids with correct auctionId
ParDo that outputs AuctionPrice(auction, price) objects
Expand | ||
---|---|---|
| ||
SELECT Rstream(auction, price) FROM Bid [NOW] WHERE auction = 1007 OR auction = 1020 OR auction = 2001 OR auction = 2019 OR auction = 2087; |
Query 3: Who is selling in particular US states?
...