You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Status

Current state: "Under Discussion"

Discussion thread: here (<- link to https://mail-archives.apache.org/mod_mbox/flink-dev/)

JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

Released: 1.15

Motivation

Currently the name of operator in sql contains logics of operator, which may be helpful when debugging at runtime. However the name of a single operator could be quite long depending on the number of columns and the complexity of the computing logic. Considering the name of a job vertex, the case becomes much worse because there could be tens of operators in a vertex.

  1. The name becomes unreadable, we can hardly get the operator topology from the name
  2. The log is hard to read, and waste a lot of IO
  3. Some external systems such metrics can not work well because of the long name:
    1. https://issues.apache.org/jira/browse/FLINK-20375
    2. https://issues.apache.org/jira/browse/FLINK-21129

Public Interfaces

This FLIP propose following new interface/configuration in order to support SQL job to separate the presentation of name and description of operator and job vertex :

  • Add a optional field  description  for Transformation to allow user set detailed information for the operation 
    • if not set, the description would be name of transformation
  • Add a execution config option table.optimizer.split-name-and-description to whether generated Transformation would use the proposed simplified name or not
    • it is true by default
    • when it is true, the generated Transformation will has a simplified name and responding detailed description as proposed in this FLIP
    • when it is false, the description and name of Transformation will the same as what it is before this FLIP
  • Add a pipeline config option pipeline.tree-mode-vertex-description  to control the style of description
    • it is false by default, so that no side effects on DataStreamAPI, will be set as true by default at sql planner.
    • if you don't like the tree-mode description proposed by this FLIP you can set it to false
  • Rest API/Web UI changes
    • "/jobs/${jobId}" will return the both name and description of all vertices, a new field description  will be added to JobVertexDetailsInfo responding to the description of vertex.
    • "/jobs/${jobId}/plan" will return both name and description of all vertices, a new field name  will be added to vertex info in JsonPlan of job graph.
    • at web ui, we will display the description of a job vertex instead of name of job vertex for details and display name at the topology

Proposed Changes

  1. Separate detail description of operator and name of operator.
    1. We use table name as the operator name for sources and sinks, because the framework would add "Source:" or "Sink:" prefix for operator name.
    2. For other operators, we use the node class name as the operator name, except that the common header StreamExec and BatchExec is trimmed. Node id is added as a postfix so that we can distinguish the operators at a vertex with the same class. So the final format of operator name would be something like Calc[1]/Deduplicate[2]/LocalGroupAggregate[3]/GlobalGroupAggregate[4]
      1. This kind of name is similar to the name of the operator in DataStream: FlatMap/Map etc.
    3. Current operator name would be used as operator description. the description will be used to construct JobVertex#operatorPrettyName, which is used to generate description of job vertex at rest api and displayed at web ui. 
      1. StreamNode and Transformation would need to add a new field: description, When description is not given, we will use name as description.
      2. For sql, we would add exec node id as the prefix of operator description and append the description of exec node, so that it would be easy to match the description to the node.
  1. Introduce a tree-mode detail description for a vertex, which provides better formatted detail information at web, can be used when debugging and analyzing sql jobs at runtime.
    1. We need to add a field “description” in the job vertex summary at the rest api and modify the ui to use the description.
    2. We can also introduce an option to fall back to old mode, in case that people may not like it.
  2. In addition, some optimization in sql operator description could be done:
    1. currently literal string contains encoding and length, which is not necessary at the description
    2. change log mode can be show at the description, so that we can know what kind of record is expected

Compatibility, Deprecation, and Migration Plan

  • this FLIP only changes the content of operator/vertex name and adding new a field on rest api, so there is no compatibility issue on data processing or programming.
  • People depends on the content of rest api at external system may need to adjust their own model definition if they match currently definition of flink rest api strictly.

Test Plan

  1. changes on internal implementation will be verified by UT.
  2. modification on web ui and rest api will be verified by manually

Rejected Alternatives

there is no rejected alternatives.

  • No labels