State

[progress record]:

Proposed time: 2022/05/06

Discussion time:

Acceptance time:

Complete time:

[issues]:

[email]:  

[release]:

[proposer]:

Motivation & Background

In the current version, an exception occurs when the Linkis Client submits the task. The client will decide whether to retry according to the parameter configuration. However, when an error occurs during the execution of the task, the client does not have a mechanism to retry, especially for some tasks, which may Because network, resource and other issues are not submitted to the EC for execution, in order to further improve the fault tolerance of the system, the client adds a retry function for tasks that report errors before being submitted to the EC for execution.

Basic concept

Expect to achieve goals

  • Linkis client adds a task retry function that reports an error if it is not submitted to the ECM for execution

Implementation plan

  • Linkis Job adds the attribute retryNums, whose type is Int;
  • The table linkis_ps_job_history_group_history adds a field to indicate whether to enter the EC for execution: execByEcm, the field type is Boolean; A function has been planned for this, task metrics are added to the record of ec information, requirements: https://github.com/apache/incubator-linkis/issues/2075 This record can be reused.
  • The client retry function is added to isCompleted of LinkisJob. This method has two implementation classes:

1. SimpleOnceJob

This class is mainly for one-time submitted tasks, such as datax, sqoop and other engines, and does not require data interaction. This type does not consider this type of retry for the time being

2. StorableLinkisJob

This class is mainly for once job type tasks, such as hive, spark and other engines, with data data interaction override def isCompleted: Boolean = getJobInfoResult.isCompleted getJobInfoResult obtains the execution status

table: linkis_ps_job_history_group_history through the interface jobhistory.

  • Retry process
  • After the task enters the ECM, update the execByEcm field to true;


Things to Consider & Note:

  • Do you need to consider the compatibility of the original parameter method? Retry is only supported if retry is added. If it is not added, it is still the original logic.

Changes


Modification Detail
1
Modification of maven module


2Modification of HTTP interface
3Modification of the client interface
4Modification of database table structure
5Modification of configuration item
6Modification Error code 
7Modifications for Third Party Dependencies

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior, how will we phase out the older behavior?
  • If we require special migration tools, describe them here.
  • When will we remove the existing behavior?