Status

Discussion thread
Vote thread
JIRA	Unable to render Jira issues macro, execution error.
Release

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Brief Introduction about Redshift

Redshift is a cloud-based data warehousing service provided by Amazon Web Services (AWS). It is designed to handle large-scale data analytics workloads, processing petabytes of structured and semi-structured data with high performance and scalability. Redshift allows businesses to store and analyze vast amounts of data in a cost-effective way, using a columnar storage format and massively parallel processing. It supports SQL queries, making it easy for users to extract insights from their data (detail reference). One of the key benefits of Redshift is its ability to scale elastically, automatically adding or removing compute nodes as needed to handle changes in workload. It also integrates with a wide range of other AWS services, such as S3, Kinesis, and Lambda, enabling users to build sophisticated data processing pipelines. Overall, Amazon Redshift is a powerful tool for businesses looking to store, process, and analyze large volumes of data in the cloud, with high performance, scalability, and ease of use.

Apache Flink is a popular stream processing framework that enables businesses to analyze and act on data as it arrives in real-time. Amazon Redshift, on the other hand, is a cloud-based data warehousing service that provides fast and cost-effective analysis of large-scale data.

The Flink Redshift Connector will enable Flink users to seamlessly integrate Flink with Redshift, allowing them to perform real-time data analysis and write the results directly to Redshift. With the Flink Redshift Connector, Flink users can take advantage of the scalability, reliability, and cost-effectiveness of Redshift, while leveraging the real-time processing power of Flink.

The benefits of using the Flink Redshift Connector include:

Real-time data analysis: With Flink, businesses can analyze data as it arrives, enabling them to respond quickly to changes and make data-driven decisions in real-time.
Scalability: The Flink Redshift Connector allows businesses to scale their data processing and analysis up or down as needed, depending on changes in workload.
Easy integration: The Flink Redshift Connector is easy to integrate with existing Flink and Redshift workflows, enabling users to get up and running quickly.
Community support: As an open source solution, the Flink Redshift Connector benefits from a vibrant community of developers and users who contribute to its development and provide support.

Overall, the Flink Redshift Connector is a powerful tool for businesses looking to perform real-time data analysis and integrate their Flink and Redshift workflows. By using the connector, businesses can take advantage of the best of both worlds, leveraging the real-time processing power of Flink and the scalability and cost-effectiveness of Redshift.

Scope

Phase 1

Integrate with Flink Sink API (FLIP-143)
Build upon Flink New DynamicTableSink and DynamicTableSinkFactory interfaces (FLIP-95)
Sink streaming results with checkpointing enabled (at-least-once delivery semantics)

Phase 2

Integrate with Flink new Source API (FLIP-27)
Integrate with Table API

Proposed Change

We propose to introduce Flink Redshift connectors.
Flink redshift connector will be part of flink-connector-aws.

Design

Compatibility, Deprecation, and Migration Plan

The flink-connector-redshift will be compatible with respect to Flink source and sink interface.
This is new connector(feature) no compatibility, deprecation, and migration plan is expected.

Test Plan

There will be Unit Tests cases to test methods in redshift-connector.
E2E test suit to be added in flink-connector-aws-e2e-tests.

Rejected Alternatives

N/A

Page tree

[WIP] FLIP-307: Redshift Connector