Introduction
The DataStream API is one of the two main APIs that Flink provides for writing data processing programs. As an API that was introduced practically since day-1 of the project and has been evolved for nearly a decade, we are observing more and more problems of it. Improvements on these problems require significant breaking changes, which makes in-place refactor impractical. Therefore, we propose to introduce a new set of APIs, the DataStream API V2, to gradually replace the original DataStream API.
The proposal to introduce a whole set new API is complex and includes massive changes. We are planning to break it down into multiple sub-FLIPs for incremental discussion. This FLIP is only used as an umbrella, mainly focusing on motivation, goals, and overall planning. That is to say, more design and implementation details will be discussed in other FLIPs.
In order to make incremental progress along this direction, we propose to vote the umbrella and sub-FLIPs separately. But we are also aware of that the decision making for some of these FLIPs may depend on the others. Therefore, we are also open to other opinions and suggestions.
Public Interfaces
Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.
A public interface is any change to the following:
Binary log formatThe network protocol and api behaviorAny class in the public packages under clientsConfiguration, especially client configurationorg/apache/kafka/common/serializationorg/apache/kafka/commonorg/apache/kafka/common/errorsorg/apache/kafka/clients/producerorg/apache/kafka/clients/consumer (eventually, once stable)
MonitoringCommand line tools and argumentsAnything else that will likely break existing users in some way when they upgrade
Proposed Changes
Describe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.
Compatibility, Deprecation, and Migration Plan
As this is a completely new API, it is not compatible with the old DataStream API.
The evolution of the new API will go through the following process:
Mark as experimental before all functions are merged.
Mark as public evolving after two minor releases.
Mark as public after two minor releases and deprecate the old API.
The removal of DataStream V1 needs to meet the following conditions at the same time:
‒ It was marked deprecated for at least two minor releases.
‒ Most users have already or can migrate to DataStream API V2.
‒ It can only be removed in major release.
This is an either in-place or smooth replacement of DataStream API. It will coexist with the old API for a considerable period of time. Once the old API is removed, all SDK-based jobs need to be migrated.
Test Plan
The corresponding test plan will be given in the sub-FLIPs.