...
- All connector repositories will remain under the ASF, which means that code will still be hosted at https://github.com/apache and all ASF policies will be followed.
- Each connector will end up in its own connector repository. For example, https://github.com/apache/flink-connector-kafka for a Kafka connector, https://github.com/apache/flink-connector-elasticsearch for an Elasticsearch connector etc.
The following connectors will be moved out from Flink's main repository to an individual repository:
Kafka
Upsert-Kafka
Cassandra
Elasticsearch
Firehose
Kinesis
RabbitMQ
Google Cloud PubSub
Pulsar
JDBC
HBase
Hive
AWS connectors:- Firehose
- Kinesis/Kinesis Streams
- DynamoDB
Only the following connectors will remain in Flink's main repository:
Hybrid Source
FileSystem
DataGen
Print
BlackHole- PRs for new connectors to Flink's main repository should not be merged, as these new connectors should also be hosted outside of Flink's main repository. If you have a connector that you would like to build or maintain, please reach out to the Flink Dev mailing list https://flink.apache.org/community.html for more information to get started using the external connector repository setup.
- A dedicated FLIP connector template exists to help you to come up with an initial proposal that can be presented on the mailing list.
- The discussion threads on these topics can be found in:
This document outlines common rules for connectors that are developed & released separately from Flink (otherwise known as "externalized").
...
How this is achieved is left to the connector, as long as it conforms to the rest of the proposal.
The flink.version
that's set in the root pom.xml should be set to the lowest Flink version that's supported. You can't use highest because there are no guarantees that something that's working in e.g. Flink 1.18 is working in Flink 1.17.
Since branches may not be specific to a particular Flink version this may require the creation of dedicated modules for each supported Flink version.
...
Change | Initial state | Final state | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
New minor Connector version |
|
| |||||||||||||||||||||
New major Connector version |
|
| |||||||||||||||||||||
New major Connector version The last 2 major version versions do not cover all supported Flink versions. |
|
| |||||||||||||||||||||
New minor Flink version An older connector to does not support any supported Flink version. | -
|
|
Externalization guide
https://github.com/apache/flink-connector-elasticsearch/ is the most complete example of an externalized connector.
...
As an example, the externalization of the Cassandra connector required these commands to be run (in a fresh copy of the Flink repository!!!):
Code Block |
---|
python3 git-filter-repo --path docs/content/docs/connectors/datastream/cassandra.md --path docs/content.zh/docs/connectors/datastream/cassandra.md --path flink-connectors/flink-connector-cassandra/
python3 git-filter-repo --path-rename flink-connectors/flink-connector-cassandra:flink-connector-cassandra |
...
We have a parent pom that connectors should use.
Code Block |
---|
<parent> <!-- This will be migrated to org.apache.flink at a later time. --> <groupId>io.github.zentol.<groupId>org.apache.flink</groupId> <artifactId>flink-connector-parent</artifactId> <version>1.0.0</version> </parent> |
It handles various things; from setting up essential modules (like the compiler plugin), to QA (including license checks!), testing and Java 11/17 support.
...
See the bottom of the <properties>
for properties that sub-projects should define.
Making changes to the parent pom
Making changes to the parent pom requires releasing org.apache.flink:flink-connector-parent artifact. But before releasing it, the changes can be tested in CI with the test project hosted in the ci branch. As the 2 components are not hosted in the same branch, a workaround so that the test project can use this updated parent without releasing it is to:
- create a flink-connector-shared-utils-test clone repo with a io.github.user.flink:flink-connector-parent custom artifact in its parent_pom branch. This allows to directly commit to and install the custom artifact in the CI
- update the parent in the pom of the test project in the ci_utils branch to use the io.github.user.flink:flink-connector-parent custom artifact
- add a custom ci script that does the cloning and mvn install in the ci.yml github action script. That way we can test in CI with the updated flink-connector-parent artifact. An example of such a script is below:
Code Block |
---|
steps:
- name: Temp check out parent_pom code
uses: actions/checkout@v3
with:
ref: "my_parent_pom_branch"
- name: Temp install parent_pom
run: mvn clean install |
CI utilities
We have a collection of ci utilities that connectors should use.
https://github.com/apache/flink-connector-shared-utils/tree/ci_utils
The CI utilities requires maintainers to think about against which Flink versions the connector should be tested. Most likely this is something like this:
- CI for PRs is tested against released versions of Flink, like 1.17.2 or 1.18.0. This enables if the change in the code would work against existing versions of Flink.
- CI for nightly/weekly builds are usually a combination of released versions of Flink, and against Flink snapshots.
- For branches of the connector that have been released (like v1.0/v3.0), it makes sense to test against Flink SNAPSHOT versions. That allows you to check in the nightly/weekly builds if your connector is still working as expected against unreleased versions of Flink.
- The
main
branch of the connector has most likely not been released yet. For those it makes sense to test against supported SNAPSHOT versions. With PRs, you should have already tested against released versions. With the nightly/weekly builds, it would allow you to test if your connector still works against unreleased versions of Flink.
- CI for PRs
The push_pr.yml workflow can be used like this:Code Block jobs: jobs: compile_and_test: strategy: matrix: flink: [ 1.17.2 ] jdk: [ '8, 11' ] include: - flink: 1.18.1 jdk: '8, 11, 17' uses: apache/flink-connector-shared-utils/.github/workflows/ci.yml@ci_utils with: flink_version: ${{ matrix.flink }} jdk_version: ${{ matrix.jdk }} python_test: strategy: matrix: flink: [ 1.17.2, 1.18.1 ] uses: apache/flink-connector-shared-utils/.github/workflows/python_ci.yml@ci_utils with: flink_version: ${{ matrix.flink }}
- CI for nightly/weekly checks
Theweekly.yml
can be used like this:
Code Block |
---|
name: Nightly
on:
schedule:
- cron: "0 0 * * 0"
workflow_dispatch:
jobs:
compile_and_test:
if: github.repository_owner == 'apache'
strategy:
matrix:
flink_branches: [{
flink: 1.17-SNAPSHOT,
branch: main
}, {
flink: 1.18-SNAPSHOT,
jdk: '8, 11, 17',
branch: main
}, {
flink: 1.19-SNAPSHOT,
jdk: '8, 11, 17, 21',
branch: main
}, {
flink: 1.17.1,
branch: v3.0
}, {
flink: 1.18.0,
branch: v3.0
}]
uses: apache/flink-connector-shared-utils/.github/workflows/ci.yml@ci_utils
with:
flink_version: ${{ matrix.flink_branches.flink }}
connector_branch: ${{ matrix.flink_branches.branch }}
jdk_version: ${{ matrix.flink_branches.jdk || '8, 11' }}
run_dependency_convergence: false |
Release utilities
We have a collection of release scripts that connectors should use.
...
The DockerImageVersions class is a central listing of docker images used in Flink tests. Since connector-specific entries will be removed once the externalization is complete connectors shouldn't rely on this class but handle this on their own (either creating a trimmed-down copy, hard-coding the version or deriving it from a Maven property).
Bundling of flink-connector-base
Connectors should not bundle the connector-base module from Flink and instead set it to provided
, as contained classes may rely on internal Flink classes.