Status

Current state: "Under Discussion"

Discussion thread: here (<- link to https://mail-archives.apache.org/mod_mbox/flink-dev/)

JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

Released: 1.15

Motivation

Currently the name of operator in sql contains logics of operator, which may be helpful when debugging at runtime. However the name of a single operator could be quite long depending on the number of columns and the complexity of the computing logic. Considering the name of a job vertex, the case becomes much worse because there could be tens of operators in a vertex.

The name becomes unreadable, we can hardly get the operator topology from the name
The log is hard to read, and waste a lot of IO
Some external systems such metrics can not work well because of the long name:

1. https://issues.apache.org/jira/browse/FLINK-20375
2. https://issues.apache.org/jira/browse/FLINK-21129

Public Interfaces

Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.

A public interface is any change to the following:

~~Binary log format~~
~~The network protocol and api behavior~~
~~Any class in the public packages under clientsConfiguration, especially client configuration~~
- ~~org/apache/kafka/common/serialization~~
- ~~org/apache/kafka/common~~
- ~~org/apache/kafka/common/errors~~
- ~~org/apache/kafka/clients/producer~~
- ~~org/apache/kafka/clients/consumer (eventually, once stable)~~
~~Monitoring~~
~~Command line tools and arguments~~
~~Anything else that will likely break existing users in some way when they upgrade~~

Proposed Changes

Separate detail description of operator and name of operator.

1. We use table name as the operator name for sources and sinks. For other operators, we use the node class name as the operator name, except that the common header StreamExec and BatchExec is trimmed. Node id is added as a postfix so that we can distinguish the operators at a vertex with the same class. So the final format of operator name would be something like Calc[1]/Deduplicate[2]/LocalGroupAggregate[3]/GlobalGroupAggregate[4]
  1. This kind of name is similar to the name of the operator in DataStream: FlatMap/Map etc.
2. Current operator name would be used as operator description. the description will be used to construct JobVertex#operatorPrettyName, which is used to generate description of job vertex at rest api and displayed at web ui.
  1. StreamNode and Transformation would need to add a new field: description, When description is not given, we will use name as description.
  2. For sql, we would add exec node id as the prefix of operator description and append the description of exec node, so that it would be easy to match the description to the node.
Introduce a tree-mode detail description for a vertex, which provides better formatted detail information at web, can be used when debugging and analyzing sql jobs at runtime.
1. We need to add a field “description” in the job vertex summary at the rest api and modify the ui to use the description.
2. We can also introduce an option to fall back to old mode, in case that people may not like it.
In addition, some optimization in sql operator description could be done:
1. currently literal string contains encoding and length, which is not necessary at the description
2. change log mode can be show at the description, so that we can know what kind of record is expected

Compatibility, Deprecation, and Migration Plan

What impact (if any) will there be on existing users?
If we are changing behavior how will we phase out the older behavior?
If we need special migration tools, describe them here.
When will we remove the existing behavior?

Test Plan

Describe in few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

Page tree

FLIP-195: Improve the name and structure of job vertex and operator name for job