State

[progress record]: Has been claimed by Xu Jie

Proposed time: 2022/05/06

Discussion time:

Acceptance time:

Complete time:

[issues]: 

[email]:   

[release]: 

[proposer]: 

Motivation & Background

To enhance Linkis cross-cluster copy function, add distcp engine

Basic concept

  • distcp: is a tool for copying within and between haodop clusters. It uses Map/Reduce for file distribution, error handling and recovery, and report generation. It takes a list of files and directories as input to map tasks, each of which will complete a copy of some of the files in the source list.

Expect to achieve goals

  • Add the linkis distcp engine, which has all the functions of the distcp tool, and implements functions such as task status monitoring, task log, and engine KILL;

Implementation plan

  • This engine belongs to the type of Once Job engine. For implementation, please refer to Linkis Sqoop
  • Because the original distcp engine is mainly for source and destination addresses, regardless of specific tables, but for users, more choices are tables, so the table input by the user needs to be converted into the corresponding path, which requires the introduction of metadata management functions ;
  • Add a mapping function to convert the parameters required by the user into the parameters required by distcp;


Things to Consider & Note:

  • Do you need to consider the compatibility of the original parameter method?

Changes


Modification Detail
1
Modification of maven module


2Modification of HTTP interface
3Modification of the client interface
4Modification of database table structure
5Modification of configuration item
6Modification Error code 
7Modifications for Third Party Dependencies

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior, how will we phase out the older behavior?
  • If we require special migration tools, describe them here.
  • When will we remove the existing behavior?