This page is meant as a template for writing a FLIP. To create a FLIP choose Tools->Copy on this page and modify with your content and replace the heading with the next FLIP number and a description of your issue. Replace anything in italics with your own description.
Status
Current state: ["Under Discussion"]
Discussion thread: here (<- link to https://mail-archives.apache.org/mod_mbox/flink-dev/)
JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)
Released: 1.18
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
CREATE TABLE AS SELECT(CTAS) statement has been support by FLIP-218, but it's not atomic. It will create the table first before job running. If the job execution fails, or is cancelled, the table will not be dropped.
We want Flink to support atomic CTAS, where only the table is created when the Job succeeds.
we refer to FLIP-218: Support SELECT clause in CREATE TABLE(CTAS) , use the existing JobStatusHook mechanism and extend Catalog's new API to implement atomic CTAS capabilities.
Public Interfaces
Introduce twoPhaseCreateTable API for Catalog.
@PublicEvolving public interface Catalog { /** * Create a {@link TwoPhaseCatalogTable} that provided transaction abstraction. * TwoPhaseCatalogTable will be combined with {@link JobStatusHook} to achieve atomicity * support in the Flink framework. Default returns empty, indicating that atomic operations are * not supported, then using non-atomic implementations. * * <p>The framework will make sure to call this method with fully validated {@link * ResolvedCatalogTable}. * * @param tablePath path of the table to be created * @param table the table definition * @param ignoreIfExists flag to specify behavior when a table or view already exists at the * given path: if set to false, it throws a TableAlreadyExistException, if set to true, do * nothing. * @param isStreamingMode A flag that tells if the current table is in stream mode, Different * modes can have different implementations of atomicity support. * @return {@link TwoPhaseCatalogTable} that can be serialized and provides atomic * operations * @throws TableAlreadyExistException if table already exists and ignoreIfExists is false * @throws DatabaseNotExistException if the database in tablePath doesn't exist * @throws CatalogException in case of any runti me exception */ default Optional<TwoPhaseCatalogTable> twoPhaseCreateTable( ObjectPath tablePath, CatalogBaseTable table, boolean ignoreIfExists, boolean isStreamingMode) throws TableAlreadyExistException, DatabaseNotExistException, CatalogException { return Optional.empty(); } }
Introduce TwoPhaseCatalogTable interface that support atomic operations.
/** * A {@link CatalogTable} for atomic semantics using a two-phase commit protocol, combined with * {@link JobStatusHook} for atomic CTAS. {@link TwoPhaseCatalogTable} will be a member * variable of CtasJobStatusHook and can be serialized; * * <p> * CtasJobStatusHook#onCreated will call the begin method of TwoPhaseCatalogTable; * CtasJobStatusHook#onFinished will call the commit method of TwoPhaseCatalogTable; * CtasJobStatusHook#onFailed and CtasJobStatusHook#onCanceled will call the abort method of * TwoPhaseCatalogTable; */ @PublicEvolving public interface TwoPhaseCatalogTable extends CatalogTable, Serializable { /** * This method will be called when the job is started. Similar to what it means to open a * transaction in a relational database; In Flink's atomic CTAS scenario, it is used to do some * initialization work; For example, initializing the client of the underlying service, the tmp * path of the underlying storage, or even call the start transaction API of the underlying * service, etc. */ void begin(); /** * This method will be called when the job is succeeds. Similar to what it means to commit the * transaction in a relational database; In Flink's atomic CTAS scenario, it is used to do some * data visibility related work; For example, moving the underlying data to the target * directory, writing buffer data to the underlying storage service, or even call the commit * transaction API of the underlying service, etc. */ void commit(); /** * This method will be called when the job is failed or canceled. Similar to what it means to * rollback the transaction in a relational database; In Flink's atomic CTAS scenario, it is * used to do some data cleaning; For example, delete the data in tmp directory, delete the * temporary data in the underlying storage service, or even call the rollback transaction API * of the underlying service, etc. */ void abort(); }
Proposed Changes
First we need to have a Table interface that can be combined with the abstract transaction capability, so we introduce TwoPhaseCatalogTable, which can perform start transaction, commit transaction, and abort transaction operations.
The three APIs corresponding to TwoPhaseCatalogTable:
begin : Similar to open transactions, we can do some prep work, such as initializing the client, initializing the data, initializing the directory, etc.
commit : Similar to commit transactions, we can do some data writing, data visibility, table creation, etc.
abort : Similar to abort transactions, we can do some data cleaning, data restoration, etc.
Note: TwoPhaseCatalogTable must be serializable, because it used on JM.
Then we need somewhere to create the TwoPhaseCatalogTable, because different Catalogs implement atomic CTAS and need to perform different operations,
for example, HiveCatalog needs to access the Hive Metastore; JDBCCatalog needs to access the back-end database, so we introduce the twoPhaseCreateTable API on the Catalog interface.
The definition of the twoPhaseCreateTable API is similar to that of the createTable API, with the extension of the isStreamingMode parameter, in order to provide a different atomicity implementation in different modes.
Integrate atomicity CTAS
Introduce CtasJobStatusHook (implements JobStatusHook interface), TwoPhaseCatalogTable is its member variable;
The implementation of the API related to the call to TwoPhaseCatalogTable is as follows:
/** * This Hook is used to implement the Flink CTAS atomicity semantics, calling the corresponding API * of {@link TwoPhaseCatalogTable} at different stages of the job. */ public class CtasJobStatusHook implements JobStatusHook { private final TwoPhaseCatalogTable twoPhaseCatalogTable; public CtasJobStatusHook(TwoPhaseCatalogTable twoPhaseCatalogTable) { this.twoPhaseCatalogTable = twoPhaseCatalogTable; } @Override public void onCreated(JobID jobId) { twoPhaseCatalogTable.begin(); } @Override public void onFinished(JobID jobId) { twoPhaseCatalogTable.commit(); } @Override public void onFailed(JobID jobId, Throwable throwable) { twoPhaseCatalogTable.abort(); } @Override public void onCanceled(JobID jobId) { twoPhaseCatalogTable.abort(); } }
Compatibility with existing non-atomic CTAS
The return value of Catalog#twoPhaseCreateTable is Optional, and we can determine whether atomicity semantics are supported based on whether the return value is empty:
empty : it means that atomicity semantics are not supported and the existing code logic is used;
not empty : it means that atomicity semantics are supported, then create a CtasJobStatusHook and use the JobStatusHook mechanism to implement atomicity semantics, as described in the code in the previous section.
Optional<TwoPhaseCatalogTable> twoPhaseCatalogTableOptional = ctasCatalog.twoPhaseCreateTable( objectPath, catalogTable, createTableOperation.isIgnoreIfExists(), isStreamingMode); if (twoPhaseCatalogTableOptional.isPresent()) { // use TwoPhaseCatalogTable for atomic CTAS statements TwoPhaseCatalogTable twoPhaseCatalogTable = twoPhaseCatalogTableOptional.get(); CtasJobStatusHook ctasJobStatusHook = new CtasJobStatusHook(twoPhaseCatalogTable); mapOperations.add( ctasOperation.toSinkModifyOperation( createTableOperation.getTableIdentifier(), createTableOperation.getCatalogTable(), twoPhaseCatalogTable, ctasCatalog, catalogManager)); jobStatusHookList.add(ctasJobStatusHook); } else { // execute CREATE TABLE first for non-atomic CTAS statements executeInternal(ctasOperation.getCreateTableOperation()); mapOperations.add(ctasOperation.toSinkModifyOperation(catalogManager)); }
Atomicity support on Stream and Batch mode
We usually think of Stream mode jobs as LONG RUNNING, i.e. they never stop, so there is no atomicity semantics, but now flink is the stream batch unified computing engine,
so we introduce isStreamingMode when we define Catalog#twoPhaseCreateTable, and Catalog can decide whether to provide atomicity semantic support.
In the production environment, there are some user-defined streams source will also be finished(no more data input), the job will also be finished,
in this case use atomic semantic implementation, will improve the user experience, by the implementation of Catalog decision.
Atomic CTAS demo
Then implementation of the atomic CTAS operation requires only two steps :
- Catalog implements the twoPhaseCreateTable method;
- Introduce the implementation class of the TwoPhaseCatalogTable interface.
HiveCatalog demo
HiveCatalog implements the twoPhaseCreateTable API:
@Override public Optional<TwoPhaseCatalogTable> twoPhaseCreateTable( ObjectPath tablePath, CatalogBaseTable table, boolean ignoreIfExists, boolean isStreamingMode) throws TableAlreadyExistException, DatabaseNotExistException, CatalogException { if (isStreamingMode) { //HiveCatalog does not support atomicity semantics in stream mode return Optional.empty(); } ... ... TwoPhaseCatalogTable twoPhaseCatalogTable = new HiveTwoPhaseCatalogTable( getHiveVersion(), new JobConfWrapper(JobConfUtils.createJobConfWithCredentials(hiveConf)), hiveTable, ignoreIfExists); return Optional.of(twoPhaseCatalogTable); }
HiveTwoPhaseCatalogTable implements the core logic
public class HiveTwoPhaseCatalogTable implements TwoPhaseCatalogTable { private static final long serialVersionUID = 1L; @Nullable private final String hiveVersion; private final JobConfWrapper jobConfWrapper; private final Table table; private final boolean ignoreIfExists; private transient HiveMetastoreClientWrapper client; public HiveTwoPhaseCatalogTable( String hiveVersion, JobConfWrapper jobConfWrapper, Table table, boolean ignoreIfExists) { this.hiveVersion = hiveVersion; this.jobConfWrapper = jobConfWrapper; this.table = table; this.ignoreIfExists = ignoreIfExists; } @Override public void begin() { // init hive metastore client client = HiveMetastoreClientFactory.create( HiveConfUtils.create(jobConfWrapper.conf()), hiveVersion); } @Override public void commit() { try { client.createTable(table); } catch (AlreadyExistsException alreadyExistsException) { if (!ignoreIfExists) { throw new FlinkHiveException(alreadyExistsException); } } catch (Exception e) { throw new FlinkHiveException(e); } finally { client.close(); } } @Override public void abort() { // may clear staging dir client.close(); } }
JdbcCatalog Demo
JdbcCatalog implements the twoPhaseCreateTable API:
@Override public Optional<TwoPhaseCatalogTable> twoPhaseCreateTable( ObjectPath tablePath, CatalogBaseTable table, boolean ignoreIfExists, boolean isStreamingMode) throws TableAlreadyExistException, DatabaseNotExistException, CatalogException { ... ... TwoPhaseCatalogTable twoPhaseCatalogTable = new JdbcTwoPhaseCatalogTable( new ObjectPath(tablePath.getDatabaseName(), tablePath.getObjectName() + "_" + System.currentTimeMillis()), tablePath, tableSchem, jdbcUrl, jdbcUserName, jdbcPassword); return Optional.of(twoPhaseCatalogTable); }
JdbcTwoPhaseCatalogTable implements the core logic
/** An implementation of {@link TwoPhaseCatalogTable} for Jdbc to support atomic ctas. */ public class JdbcTwoPhaseCatalogTable implements TwoPhaseCatalogTable { private final ObjectPath tmpTablePath; private final ObjectPath finalTablePath; private final Map<String, String> schema; private final String jdbcUrl; private final String userName; private final String password; public JdbcTwoPhaseCatalogTable( ObjectPath tmpTablePath, ObjectPath finalTablePath, Map<String, String> schema, String jdbcUrl, String userName, String password) { this.tmpTablePath = tmpTablePath; this.finalTablePath = finalTablePath; this.schema = schema; this.jdbcUrl = jdbcUrl; this.userName = userName; this.password = password; } @Override public void begin() { // create tmp table, writing data to the tmp table Connection connection = getConnection(); connection .prepareStatement("create table " + tmpTablePath.getFullName() + "( ... ... )") .execute(); } @Override public void commit() { // Rename the tmp table to the final table name Connection connection = getConnection(); connection .prepareStatement( "rename table " + tmpTablePath.getFullName() + " to " + finalTablePath.getFullName()) .execute(); } @Override public void abort() { // drop tmp table Connection connection = getConnection(); connection.prepareStatement("drop table " + tmpTablePath.getFullName()).execute(); } private Connection getConnection() { // get jdbc connection return JDBCDriver.getConnection(); } }
Compatibility, Deprecation, and Migration Plan
It is a new feature with no implication for backwards compatibility.
Test Plan
changes will be verified by UT