Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
As discussed above, currently flink HybridSource is released, but it only be used in DataStream. We Need to add sql support for many table & sql end users.
so we propose this flip.
Basic Idea
Add a new built-in hybrid connector. First, In the HybridTableSourceFactory, use 'sources' option to concat ordered some child sources.
Next, we deal with indexed concrete child source option and pass to child source table factory to create child table source instances.
When child table source instances are ready, we use child table source ScanRuntimeProvider to get the actual child Source(FLIP-27 new Source API)
Finally, we bind sources to HybridSource.
ddl (normal)
ddl(with different filed name, it's a feature, may not be implemented finally. need to be discussed)
csv acutal data names: A,B,f2
kafka acutal data names: f0,f1,f2
it means csv column is A,B we match them to the ddl fields. kafka column is f0,f1,f2, no need to match.
user can use kafka acutal data names to be ddl fields or csv field names or other cases.
options:
sources:Use comma delimiter indicate child sources that need to be concatenated. it's in order. The boundedness of hybrid source is last child source's boundedness.
schema-field-mappings: Use json kv to match the different field names with ddl field (It's an extra feature, the draft pr below show how it implements and works).
Start position conversion:
Currently, the FileSource not expose the end position, we can't use it pass to the next streaming source. detail:
Actually, by using sql we can definite the next streaming source, for example, we can definite kafka start-position.
When first batch bounded data read finished, the hybrid source will call to read kafka with given start-position or other start strategy.
Prototype implementation
HybridTableSource
HybridTableSource bind accepted child sources with given order to final HybridSource.
HybridTableSourceFactory
TableSourceFactory is using for Java SPI to search hybrid source implementation.
HybridConnectorOptions
Options for creating HybridTableSource.
Draft PR:
Compatibility, Deprecation, and Migration Plan
It's a new support without migration currently.
Test Plan
Add unit test case for HybridTableSourceFactory and HybridTableSource.
Add integration test cases for hybrid source sql ddl.
Rejected Alternatives
to be added.