Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We propose to introduce the PatternProcessor ManagerPatternProcessorManager interface. The manager handles the pattern processor updates and provides information about current active pattern processors.

...

Add Key Generating and Pattern Matching & Processing Operator

In order to achive the functions described in pubic API, we propose to add two stream operators in the stream graph.

The first is a key generating operator. For each input data and each active PatternProcessor, the operator would generate a key according to the KeySelector in the PatternProcessor and attach the key to that data.

After that, the stream graph would do a keyBy() operation on the input data, using the generated key. In this way the downstream operator could have data with the same key aggregated on the same subtask, and that key can be dyncamically updated.

Then the second operator serves to match data to patterns and to process the matched results with the PatternProcessFunctions. The first duty is almost the same as that of CepOperator, except that it needs to handle multiple and dynamic patterns now. The second duty has been achieved by PatternStream.process() method, but now it needs to be achieved by inner implementations.

The general structure that supports dynamic pattern processors in a job graph is shown in the wireframe below.

...

In the wireframe above, Key Generating Operator and Pattern Matching & Processing Operator uses information from the pattern processors and thus need a PatternProcessorDiscoverer. Our design of collecting pattern processors and update them inside operators is shown in the wireframe below. In each of the two operators mentioned, there will be pattern processor managers discoverers inside each of their Operator Coordinators. The manager discoverer in the OperatorCoordinator serves to discover pattern processor changes and convert the information into pattern processors. Then the subtasks, after receiving the latest pattern processors from Operator Coordinator, will make the update.

...

Possible implementations of pattern processor discovery and serialization format

The public interfaces we propose above aims to yield the flexibility of discovering pattern processors and serializing them to users. It is also possible that we provide some built-in implementations for common uses. Some possible ways for the discoverer to discover pattern processors are as follows:

  • The discoverer periodically checks a file in filesystem or a table in database for updates

  • The discoverer listens on external commands, like HTTP requests, for updates

And the format to transmit serialized pattern processors to Flink jobs before it is converted to Java objects can be follows:

  • Directly serializing Java objects using Java serialization or Kryo

  • Using readable formats like JSON or XML

  • Using the standard MATCH_RECOGNIZE grammar in SQL proposed by ISO

...