Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Samza Job Owners: To make their Samza jobs robust
  • Samza Dev Teams: To enable dev teams make their samza infra robust

This SEP targets two phase development of the Framework:

  • PHASE I: Testing Samza's public apis targeting Samza job Owners
  • PHASE II: Testing various component of Samza targeting testing  Samza Dev team

Terminologies

  • p1: refers to apis targeted for development for Phase I
  • p2: refers to apis targeted for development for Phase II

Motivation

Addition of this Continuous Integration Test Framework will alleviate:

...

  • In Memory Data Stream
    Users have ability to produce to & consume from in memory system partitions after SEP-8. In Memory Data system alleviates the need of serialization/deserialization of data since it does not persist data. We take advantage of this and provide very succinct Stream classes to serve as input data sources, which are the following:
    • Collection Stream (p1)
      Users can plug a collection (either List or Map) to create an in-memory input stream
      e.g:  CollectionStream.of(...,{1,2,3,4})
    • Event Builder Stream (p2)
      Event builder helps a user to mimic runtime samza processing environment in its tests, for example adding an exception in the steam, advancing time for window functions.
  • File Stream
    Users can create an input stream reading and writing from local file
    e.g:  FileStream.of("/path/to/file")
  • Local Kafka Stream
    Users can also consume bounded streams from a kafka topic which serves as initial input. Samza already provide api's to consume from and produce to kafka. For kafka streams we will need serde configurations, however implementation of bounded streams in kafka (using Latche Message or EndOfStreamMessage) is beyond scope of this SEP

...

The test framework is designed in a way which asks users to do none or minimal Samza configs, in future we intend to use StreamDescriptors in the test framework to do Samza configs. With StreamDescriptor and High Level api, another wrapper can be provided over the job to run the configuration in a test environment

Test Matrix (Plan)

Shown below is a targeted test matrix plan to test various components of Samza in p2.


Samza APIConcurrencyPartitionsContainerExpected Result
StreamTask
task.max.concurrency = 1
job.container.thread.pool.size = n
1 <= p <= n1 / nin order processing
   
task.max.concurrency > 1
job.container.thread.pool.size = n
1 <= p <= n1 / nout of order processing
   
AsyncStreamTask
task.max.concurrency = 11 <= p <= n1 / nin order processing
task.max.concurrency = n1 <= p <= n1 / nout of order processing
Windowable TaskN/A1 <= p <= n1 / nexpecting processing n messages for messages window of time t
Initiable TaskN/A1 <= p <= n1 / nstateful testing (assertions on kv store)
Closable TaskN/A1 <= p <= n1 / n?

Map / Flatmap / Filter /

Partition By / Merge /

SendTo / Sink / Join /

Window

 

task.max.concurrency = 1

job.container.thread.pool.size = n

1 <= p <= n 1 / nin order processing

task.max.concurrency > 1

job.container.thread.pool.size = n

out of order processing

...