Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents

Proposers

Approvers

Status

Current state


Current State

Status
titleUnder Discussion

(tick)

Status
colourYellow
titleIn Progress


Status
colourRed
titleABANDONED


Status
colourGreen
titleCompleted


Status
colourBlue
titleINactive


...

So, I am proposing to support for abstract the common hudi-sync. Then  other service like aws alue、aliyun glue、aliyun datalake analytics can implement.

...

The organizational form of Option1 will be clearer, and user dependence will be clearer; but an additional module will be added.
Option2 has no changes to the current module organization, but the general interface implementation is placed in the hudi-hive-sync module. The user-defined implementation depends on this module, which is a bit semantically strange.

I personally prefer Option2Option1.

2.2 Code (class) structure

Image Addedimage.pngImage Removed

Among them, AbstractHoodieSyncClient is an abstract synchronization client, the default implementation is HoodieHiveClient; users can customize to implement Client.
The abstract methods in AbstractHoodieSyncClient are below

public abstract void createTable(String tableName, MessageType storageSchema,
String inputFormatClass, String outputFormatClass, String serdeClass);
public abstract boolean doesTableExist(String tableName);

public abstract Option<String> getLastCommitTimeSynced(String tableName);

public abstract void updateLastCommitTimeSynced(String tableName);

public abstract void addPartitionsToTable(String tableName, List<String> partitionsToAdd);

public abstract void updatePartitionsToTable(String tableName, List<String> changedPartitions);

If option1 is used, then AbstractHoodieSyncClient will be put into the hudi-common-sync module; if option2 is used, then AbstractHoodieSyncClient will be put into hudi-hive-sync.
Image Removed
Image Added

AbstractSyncTool is an abstract synchronization tool. All synchronization tools must inherit this class, and the default implementation is HiveSyncTool.

...