...
We propose to introduce the PatternProcessor Manager
PatternProcessorManager
interface. The manager handles the pattern processor updates and provides information about current active pattern processors.
...
In order to achive the functions described in pubic API, we propose to add two stream operators in the stream graph.
The first is a key generating operator. For each input data and each active PatternProcessor, the operator would generate a key according to the KeySelector in the PatternProcessor and attach the key to that data.
After that, the stream graph would do a keyBy() operation on the input data, using the generated key. In this way the downstream operator could have data with the same key aggregated on the same subtask, and that key can be dyncamically updated.
Then the second operator serves to match data to patterns and to process the matched results with the PatternProcessFunctions. The first duty is almost the same as that of CepOperator, except that it needs to handle multiple and dynamic patterns now. The second duty has been achieved by PatternStream.process()
method, but now it needs to be achieved by inner implementations.
The general structure that supports dynamic pattern processors in a job graph is shown in the wireframe below.
...
In the wireframe above, Key Generating Operator and Pattern Matching & Processing Operator uses information from the pattern processors and thus need a PatternProcessorDiscoverer
. Our design of collecting pattern processors and update them inside operators is shown in the wireframe below. In each of the two operators mentioned, there will be pattern processor managers discoverers inside each of their Operator Coordinators. The manager discoverer in the OperatorCoordinator serves to discover pattern processor changes and convert the information into pattern processors. Then the subtasks, after receiving the latest pattern processors from Operator Coordinator, will make the update.
...
The public interfaces we propose above aims to yield the flexibility of discovering pattern processors and serializing them to users. It is also possible that we provide some built-in implementations for common uses. Some possible ways for the discoverer to discover pattern processors are as follows:
The discoverer periodically checks a file in filesystem or a table in database for updates
The discoverer listens on external commands, like HTTP requests, for updates
And the format to transmit serialized pattern processors to Flink jobs before it is converted to Java objects can be follows:
Directly serializing Java objects using Java serialization or Kryo
Using readable formats like JSON or XML
Using the standard
MATCH_RECOGNIZE
grammar in SQL proposed by ISO
...