Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

[FLIP-109] Improve Hive dependencies out-of-box experience

Motivation

We want to improve hive-integration out-of-box experience.

...

Spark and Presto already built-in hive dependencies. The startup of users is very simple.

We have discussed in [1].

We have documented the dependencies detailed information[2]. But still has some inconvenient:

  • Too many versions, users need to pick one version from 8 versions.
  • Too many versions, It's not friendly to our developers either, because there's a problem/exception, we need to look at eight different versions of hive client code, which are often various.
  • Too many jars, for example, need to download 4+ jars for Hive 1.x.
  • Version in Yaml/HiveCatalog needs to be consistent with the dependencies version. There are three places: version in metastore, version in dependencies, version in Yaml/HiveCatalog, users are easy to make mistakes.

Public Interfaces

A hive integration startup should just need do:

  • Providing Hadoop classes. [23]
  • Downloading one hive pre-bundled jar from flink-web. (With few hive pre-bundled versions)
  • Providing hive conf directory path. (User should not provided hive version things)

And them users Users can enjoy their Flink-Hive trip.

...

About primary key support in Hive 2.1.0, unique support and not null support in Hive 3.0.0.

Solution: 

  • We provide a new version of bundled jar to support.

Alter table statistics

In the version before 1.2.1, alter table stats is are not effective. In 1.10, we can refuse to do this operation on the old version by detecting the hive version. But after unifying it into one version, we can't detect it like this. As a result, we may have invoked the HMS API but it doesn't work.

Solution: 

  • We need document documents for this.
  • Second: We can check the stats after updating.

...

The date type statistics are supported from 1.2.0. If we pass this data to the old version of HMS, will it report an error? Can we judge whether it is due to the date statistics based on the error information.

Solution:

  • We need document documents for this.
  • Will throw an exception due to missing DateColumnStatsData thrift class, we can improve the exception message.

...

Not better than pre-bundled way.

Discussion in Google doc: https://docs.google.com/document/d/14C4933qfPXeHK34rERPCJ514MTrk_MC2beNUSgQO_Uo/edit?usp=sharing

[1]http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html

[2]https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies

[3]https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/hadoop.html#providing-hadoop-classes


Discussion in Google doc: https://docs.google.com/document/d/14C4933qfPXeHK34rERPCJ514MTrk_MC2beNUSgQO_Uo/edit?usp=sharing