...
/** /**
|
CatalogDatabase
In the InMemoryCatalog scenario, in order to avoid the situation that the database does not exist when the table creation is executed on the JM side, we also need to serialize the CatalogDatabase when we serialize InMemoryCatalog, so the CatalogDatabase needs to extend Serializable. Currently only InMemoryCatalog serialization requires serializing CatalogDatabase.
...
Implementation Plan
The overall execution process is shown in the following figure.
...
Key Points for Catalog Support Serializability:
Built-in Catalog::
- InMemoryCatalog: Due to the CatalogDatabase and CatalogBaseTable etc can't be serialized by java serialization mechanism directly, so the InMemoryCatalog doesn't support to serializeInMemoryCatalog: Here are some special case, due to the tables in InMemoryCatalog already exist in the external system, metadata information in InMemoryCatalog is only used by the job itself and is only stored in memory. The database related information in InMemoryCatalog needs to be serialized and then passed to JM, otherwise the database may not exist when JM creates the table. Other objects do not need to be serialized. The CatalogDatabase interface need extends the Serializable.
- JdbcCatalog: The required member variables to construct Catalog object are directly serializable, such as username, password, base url, etc. The JdbcDialectTypeMapper interface need extends the serializable.
- HiveCatalog: All member variables can be serialized directly, except for the HiveConf object, which cannot be serialized directly. We can refer to JobConfWrapper to solve the serialization problem of HiveConf.
...
- HiveCatalog:
- If hive-conf-dir is specified, since the configuration of hive-conf-dir is a local path, please make sure that all nodes in the cluster put the hive configuration file under the same path, otherwise JM will not find the file and FAILED. This problem also exists in the current application mode of Flink.
- If hive-conf-dir is not specified, then HiveCatalog will look for hive-site.xml from Java Classpath, then we have to solve the hive-site.xml upload problem and make sure that all modes in Flink Client and JM Classpath can find Otherwise the job will fail.
- We need to additionally serialize the database information that already exists in the InMemoryCatalog, otherwise the operation to create a table on the JM side may fail because the corresponding database cannot be found.
References
- Support SELECT clause in CREATE TABLE(CTAS)
- MySQL CTAS syntax
- Microsoft Azure Synapse CTAS
- LanguageManual DDL#Create/Drop/ReloadFunction
- Spark Create Table Syntax
...