Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The purpose of doing so is:

  1. Use the same method to solve the serialization/deserialization problem of DDL creation Catalog and TableEnvironment#registerCatalog registration Catalog.
  2. Reduce the cost of user-defined catalogs without much consideration for serialization (If the CREATE TABLE AS SELECT (CTAS) function is supported, the catalog must be serializable).

Key Points for Catalog Support Serializability:

  • InMemoryCatalog: here are some special case. Due to the tables in InMemoryCatalog already exist in the external system, metadata information in InMemoryCatalog is only used by the job itself and is only stored in memory. The database related information in InMemoryCatalog needs to be serialized and then passed to JM, otherwise the database may not exist when JM creates the table. Other objects do not need to be serialized. The CatalogDatabase interface need extends the serializable.
  • JdbcCatalog: The main member variables are directly serializable, such as username, password, base url, etc. The JdbcDialectTypeMapper interface need extends the serializable.
  • HiveCatalog: All member variables can be serialized directly, except for the HiveConf object, which cannot be serialized directly. We can refer to JobConfWrapper to solve the serialization problem of HiveConf.

Create Table As Select(CTAS) features depend on the serializability of the catalog. To quickly see if the catalog supports CTAS, we need to try to serialize the catalog in planner and if it fails, an exception will be thrown to indicate to the user that the catalog does not support CTAS because it cannot be serialized.

Runtime

Provide JM side, job status change hook mechanism.

...