Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For in memory streams the api actually initializes the in memory stream and spins by spinning up a Samza producer using an InMemorySystemProducer to write the stream, this is how a collection of data or events is gets initialized as a steam. It also configures any output stream if the user configureshas added any.

Data Transformation:

This is the Samza job you write, this can be either done using Low Level Api or the fluent High Level Api. Test frameworks provides api to set up test for both the samza apicases. Test framework supports both the api apis with Single container and Multi-container mode. Users implement StreamTask and Async Stream task in the same way, as they do for their Samza job, and they pass it along to the framework. For high level api

Data Validation: 

...

level api users don't need a class implementing StreamApplication, they just configure message streams of any type and apply operators on it directly (see the sample example below).

Data Validation: 

For the low level api once user runs the job, users can assert data from any streams the job uses once user runs the job, users can assert data from any intermediate streams the job produced to or the final stream that contains the output. Whereas Samza fluent api does job chaining hence only the final expected output can be compared in this case.    TaskAssert spins up a consumer for the system under test, gets the messages in the stream and compares this result with the expected value.

Data Types & Partitions:

Samza provides complete flexibility in usage of different data types for input steams, this The framework will also provide complete flexibility for usage of primitive and derived data types. Test framework will provide api's for initialization of input streams (data injection), read from/write to Serdes are required for local kafka stream and file stream but in memory streams dont require any Serde configuration. Test framework will provide api's for initialization of input streams (both single and multi-partition), and also data validation on single partition and multi-partition bounded streams (data transformation) and verification of expected to actual results (data validation)of the bounded streams 

Running Config

Traditionally we ask users to set up config for any samza job, for test purposes we set up basic config boiler plate for users and provide them a flexible option to still add any custom config (rarely needed), api exposes functions to configure single container or multi container mode (using Zookeeper). It also provides apis to configure concurrency semantics for the job. 

 

Image Removed

them a flexible option to still add any custom config (rarely needed), api exposes functions to configure single container or multi container mode (using Zookeeper). It also provides functions to configure concurrency semantics for their Samza job. 

 

Image Added


Future Changes with Stream Descriptors:

The test framework is designed in a way which asks users to do none or minimal Samza configs, in future we intend to use StreamDescriptors in the test framework to do Samza configs. This change would cause a small change in the way user passes their custom configs (if any) to the test framework 

Public Interfaces

Two Apis for writing tests are: Low Level Test Api (TestTask) & High Level Api (TestApplication)

...

Implementation and Test Plan

Implementatio 

Compatibility, Deprecation, and Migration Plan

...