Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To pass the resource path information from table environment to execution environment, we will use pipeline.jars option. Currently StreamExecutionEnvironment configure method doesn't override pipeline.jars option from dynamic configuration when generate Jobgraph, this will result in the user jar used in the query not being uploaded to the blobstore and a ClassNotFoundException will be thrown at distributed runtime, so the pipeline.jars option needs to be overridden in this method.

We can use the pipeline.jars option have the precondition is that the resource will be downloaded to local during client compile phase whether the resource is local or remote.

Implementation Plan

Supported Resource Type

...

For remote resource there are different storage scheme such as HTTP, HDFS, S3, etc. During the compilation period of the query, it will first download the remote resource to a local temporary directory based on its path, and then add the local path to the user class loader. The next behavior is the same as local resource section.If the remote resource is in HDFS, users need to configure the Hadoop environment on the machine which runs query first, and then add the Hadoop common related jars to the JVM classpath. We do not package Hadoop related dependencies into the uber jar of table-api-java

module so as to avoid class conflict issuesWe will use use Flink's FileSytem abstraction to download remote resource, which can support all types of file system, including HTTP, S3, HDFS, etc, rather than binding to a specific implementation. Currently in the first version, we will give priority to support HDFS as a resource provider, HDFS is used very much.

Note: Currently, ADD JAR syntax only supports adding local resources. With the release of this advanced function DDL feature, ADD JAR syntax also supports adding remote resources.

...

  1. Proposing an internal interface UserClassLoaderContext, this entity performs bookkeeping of used/loaded resources. This entity should also provide a UserClassLoader that inherits from the class loader given via EnvironmentSettings. UserClassLoaderContext is responsible for resource manager manager all used resources.
  2. The table environment directly refers UserClassLoaderContext instead of ClassLoader object, to ensure that when the ClassLoader object changes, the latest available ClassLoader can be obtained.
  3. The objects which need use the custom ClassLoader such as DataTypeFactoryImpl currently hold UserClassLoaderContext directly in entire table module, instead of holding ClassLoader before.

...