...
- Samza Job Owners: To make their Samza jobs robust
- Samza Dev Teams: To enable dev teams make their samza infra robust
This SEP targets two phase development of the Framework:
- PHASE I: Testing Samza's public apis targeting Samza job Owners
- PHASE II: Testing various component of Samza targeting testing Samza Dev team
Terminologies
- p1: refers to apis targeted for development for Phase I
- p2: refers to apis targeted for development for Phase II
Motivation
Addition of this Continuous Integration Test Framework will alleviate:
...
- In Memory Data Stream
Users have ability to produce to & consume from in memory system partitions after SEP-8. In Memory Data system alleviates the need of serialization/deserialization of data since it does not persist data. We take advantage of this and provide very succinct Stream classes to serve as input data sources, which are the following:- Collection Stream (p1)
Users can plug a collection (either List or Map) to create an in-memory input stream
e.g: CollectionStream.of(...,{1,2,3,4}) - Event Builder Stream (p2)
Event builder helps a user to mimic runtime samza processing environment in its tests, for example adding an exception in the steam, advancing time for window functions.
- Collection Stream (p1)
- File Stream
Users can create an input stream reading and writing from local file
e.g: FileStream.of("/path/to/file") - Local Kafka Stream
Users can also consume bounded streams from a kafka topic which serves as initial input. Samza already provide api's to consume from and produce to kafka. For kafka streams we will need serde configurations, however implementation of bounded streams in kafka (using Latche Message or EndOfStreamMessage) is beyond scope of this SEP
...
The test framework is designed in a way which asks users to do none or minimal Samza configs, in future we intend to use StreamDescriptors in the test framework to do Samza configs. With StreamDescriptor and High Level api, another wrapper can be provided over the job to run the configuration in a test environment
Test Matrix (Plan)
Shown below is a targeted test matrix plan to test various components of Samza in p2.
Samza API | Concurrency | Partitions | Container | Expected Result | |
---|---|---|---|---|---|
StreamTask | task.max.concurrency = 1 job.container.thread.pool.size = n | 1 <= p <= n | 1 / n | in order processing | |
task.max.concurrency > 1 job.container.thread.pool.size = n | 1 <= p <= n | 1 / n | out of order processing | ||
AsyncStreamTask | task.max.concurrency = 1 | 1 <= p <= n | 1 / n | in order processing | |
task.max.concurrency = n | 1 <= p <= n | 1 / n | out of order processing | ||
Windowable Task | N/A | 1 <= p <= n | 1 / n | expecting processing n messages for messages window of time t | |
Initiable Task | N/A | 1 <= p <= n | 1 / n | stateful testing (assertions on kv store) | |
Closable Task | N/A | 1 <= p <= n | 1 / n | ? | |
Map / Flatmap / Filter / Partition By / Merge / SendTo / Sink / Join / Window
| task.max.concurrency = 1 job.container.thread.pool.size = n | 1 <= p <= n | 1 / n | in order processing | |
task.max.concurrency > 1 job.container.thread.pool.size = n | out of order processing |
...