GlutenProposal

Abstract

Gluten is a middle layer responsible for offloading Apache Spark SQL queries to native engines. This project aims to address the CPU computational bottleneck to offload Apache Spark SQL operators to native engines in data loading scenarios based on Apache Spark. With advancements in IO technologies, especially the widespread use of SSDs and 10GbE NICs or higher bandwidth, CPU computation has gradually become the primary limiting factor for performance. However, optimizing CPU instructions based on the JVM is relatively challenging compared to other native languages like C++, as the JVM provides fewer optimization capabilities.

Proposal

The Gluten project utilizes Apache Spark's plugin mechanism to intercept and send query plans to native engines for execution, bypassing Apache Spark's less efficient execution path. The project supports multiple native engines as backends, including Velox, ClickHouse, and Apache Arrow. For operations that the native engines cannot handle, Gluten falls back to Spark's normal execution path. In terms of thread models, Gluten utilizes JNI (Java Native Interface) library calls to directly invoke native code within Spark executor task threads, avoiding the introduction of complex thread models.

Background

Apache Spark is a stable, mature project that has been under development for many years. The project has proven to be one of the best frameworks for processing petabyte-scale datasets. However, the Spark community has had to address performance challenges that required various optimizations over time. A key optimization introduced in Spark 2.0 replaced Volcano mode with whole-stage code-generation to achieve a 2x speedup. Most of the optimization works at the query plan level.

However, there is a need to address query performance more broadly. The industry understands the current performance bottleneck in the existing Spark. Databricks did create Photon as a high-performance native vectorized query engine, but it is commercial software and close source as well. This motivated Intel and Kyligence to initiate the Gluten project to unleash the power of Advanced Vector Extensions (AVX) technology using SIMD instructions within a vectorized SQL engine, which enables Apache Spark to break through its row-based data processing and JVM limitations.

You can find more information on Gluten at the existing open-source website:

https://oap-project.github.io/gluten/

Rationale

The Gluten project aims to bridge the gap between Spark SQL's scalability and native libraries' performance benefits. By reusing Spark's control flow and JVM code while offloading compute-intensive data processing to native code, we seek to significantly improve performance without requiring changes to existing SparkSQL jobs. This approach involves transforming Spark's physical plan into a Substrait plan and passing it to native libraries, enabling the seamless execution of SparkSQL jobs with enhanced performance.

Multiple native Backend Support

There are numerous mature open-source native SQL engine products and libraries available in the market, including Velox, ClickHouse, and Apache Arrow, among others. Gluten has opted for Velox and ClickHouse as backend support but remains open to expanding its support to incorporate other esteemed open-source native SQL engines.

Meta has launched Velox (https://github.com/facebookincubator/velox), an open-source unified execution engine designed to enhance data management system efficiency and simplify development.

ClickHouse (https://clickhouse.com/) is an open-source column-oriented database management system designed for high-performance analytics and data warehousing, capable of handling massive amounts of data with lightning-fast query processing.

Plan Conversion

Gluten uses Substrait.io(https://github.com/substrait-io/substrait) to build an unified query plan tree and connect to an individual backend engine. Gluten converts Spark’s physical plan to a Substrait plan for each backend, then shares the Substrait plan over JNI to trigger the execution pipeline in the native library.

Memory Management

Gluten leverages Spark’s existing memory management system. It calls the Spark memory registration API for every native memory allocation/deallocation action. Spark manages the memory for each task thread. If the thread needs more memory than is available, it can call the spill interface for operators that support this capability. Spark’s memory management system protects against memory leaks and out-of-memory issues.

Columnar Shuffle

Shuffle itself is a crucial factor affecting Spark performance. It involves multiple steps such as serialization/deserialization, network transmission, and disk I/O. To achieve high performance and avoid becoming a bottleneck, careful considerations are needed. Since the Native Engine utilizes a columnar data structure to store data, simply adopting Spark's row-based data model for Shuffle would introduce data column-to-row conversion in the Shuffle Write phase and data row-to-column conversion in the Shuffle Read phase. This is necessary to ensure smooth data circulation. However, both row-to-column and column-to-row conversions come at a cost. Therefore, Gluten must provide a comprehensive Columnar Shuffle mechanism to bypass these conversion overheads. In terms of the specific implementation of columnar shuffle, it can be broadly divided into two parts: shuffle data writing and shuffle data reading.

Gluten also integrated with Apache Celeborn(incubating)(https://celeborn.apache.org), which is a mature general-purpose Remote Shuffle Service that can effectively address the stability, performance, and elasticity issues present in local shuffling of big data engines. The Apache Celeborn community and the Gluten community have been cooperating with each other for some time, successfully integrating Celeborn into Gluten. This integration allows Spark to better embrace the Cloud Native approach.

Shim Layer

To seamlessly integrate with Spark, Gluten incorporates a Shim Layer to effectively manage diverse API versions across different Spark releases, enabling seamless extension for supporting multiple Spark versions. Presently, Gluten offers support for Spark 3.2 and 3.3, with additional support for further Spark versions in the pipeline.

Fallback Mechanism

Gluten utilizes the established Spark JVM engine to validate operator compatibility with the native library. In cases where the operator is not supported, Gluten seamlessly reverts to the pre-existing Spark-JVM-based operator. However, this fallback mechanism incurs a performance trade-off due to the necessitated columnar-to-row and row-to-column data conversions.

Spark Metrics Extension

Gluten greatly enhances Spark’s Metrics functionality by seamlessly integrating with it. While the default Spark metrics are tailored for Java row-based data processing, Project Gluten takes it a step further. We enrich this functionality with a specialized column-based API and introduce supplementary metrics. This augmentation not only optimizes the use of Gluten but also offers developers valuable tools for debugging these native libraries effectively.

Initial Goals

Implement a robust mechanism to transform Spark's physical plan into Substrait plan.
Develop a seamless integration of native libraries for offloading performance-critical data processing.
Define clear JNI interfaces for efficient communication between SparkSQL and native libraries.
Enable easy switching between available native backends to enhance flexibility and performance optimization.
Implement a data-sharing mechanism between JVM and native code to manage data effectively.
Extend support to native accelerators for enhanced performance gains in specific use cases.
Provide detailed documentation and guides for users to seamlessly configure and utilize Gluten within their SparkSQL environments.
Expanding our support to encompass a broader range of big data frameworks, including Flink, Trino, and more.
To cultivate an active and vibrant Apache community, one that empowers development teams and fortifies the project's strength.

Current Status

Gluten has achieved a v1.1.0 release in Nov. 2023 with below major features:

20% performance improvement in Decision Support Benchmarks comparing to v1.0.0
Support Spark 3.2 and Spark 3.3
Support Spark 3.4 (experimental)
Run Pass all Velox UTs, Spark 3.2/3.3 SQL related UTs
Support Ubuntu 20.04/22.04, CentOS 7/8, alinux 3, Anolis 7/8
Support File System: localfs, HDFS, S3, OSS(via s3a), GCS
Support File Format: Parquet, ORC
Support Data Lake: deltalake (experimental)
Support Data Types: Primitive Type, Decimal, Date, Timestamp, Array (partial), Map (partial), Struct (partial)
Support 28 common Spark Operators, detail here
Support 199 common Spark Functions, detail here
Support Dynamic Memory Pool and Spill
Support Velox UDF
Support Gluten UI to print fallback event in History Server
Support Hadoop HA and Kerberos
Velox code updated to 20231123(commit-id: aff0cdec613d26294fb98b89ef292bc3c1a2e82e)
Document Improvement

Meritocracy:

This proposal aims to cultivate a diverse developer and user community around Gluten, following the Apache Software Foundation's meritocracy model. Since Gluten was open-sourced, numerous enterprises have adopted it to seamlessly integrate with their existing SparkSQL services. Consequently, the Gluten project has received a significant influx of issue reports and enhancements from these companies. The project is currently hosted and supported by Intel and Kyligence accounts on GitHub and maintains close associations with various big data projects within the ASF.

Due to our project's alignment with ASF's values and integration potential with its ecosystem, we have been approached multiple times by our users regarding the possibility of Gluten being incubated under ASF. Presently, the codebase is primarily overseen by a collaborative group of developers from Intel, Kyligence, BIGO, Alibaba, NetEase, Meituan and more. We also warmly welcome individual developers to join as core contributors to Gluten. Our commitment is to foster an environment that promotes and recognizes meritocracy within the project.

Community:

Over the past year, Gluten has dedicated itself to nurturing a thriving community of contributors and users for its framework. As of now, Gluten has achieved a remarkable milestone with 800 stars and 289 forks on GitHub. We are confident that we can continue to leverage the support and expertise of the Apache Spark community to further enhance our efforts.

Core Developers:

Binwei Yang <binwei.yang at intel dot com>: The major contributor of this project from Intel
Weiting Chen <weiting.chen at intel dot com >: The major contributor of this project from Intel
Yuan Zhou <yuan.zhou at intel dot com >: The major contributor of this project from Intel
Rui Mo <rui.mo at intel dot com >: The major contributor of this project from Intel
Hongze Zhang <Hongze.Zhang at intel dot com >: The major contributor of this project from Intel
Jia Ke <ke.a.jia at intel dot com >: The major contributor of this project from Intel
Feilong He <Feilong.He at intel dot com >: The major contributor of this project from Intel
Marin Ma <rong.ma at intel dot com >: The major contributor of this project from Intel
Chang Chen <chang.chen at kyligence dot io> : The major contributor of this project from Kyligence
Hongbin Ma <mahongbin at apache dot org> : The major contributor of this project from Kyligence, Apache Kylin Committer & PMC Member
Zhichao Zhang<zhangzc at apache dot org> : The major contributor of this project from Kyligence, Apache Kylin Committer, Apache CarbonData Committer & PMC member
Neng Liu <neng.liu at kyligence dot io> : The major contributor of this project from Kyligence
Shuai Li <shuai.li at kyligence dot io> : The major contributor of this project from Kyligence
Yang Li <liyang910910 at gmail dot com> : The major contributor of this project from BIGO
Jiabiao Liang <lgbo.ustc at gmail dot com> : The major contributor of this project from BIGO
Zhibiao Zhang < zhanglinuxstar at gmail dot com> : The major contributor of this project from BIGO
Chunwei Zuo <zuochunwei at meituan dot com >: The major contributor of this project from Meituan
Kuo Zhao <zhaokuo_game at 163 dot com >: The major contributor of this project from Meituan
Zhen Li <zhli at microsoft dot com >: The major contributor of this project from Microsoft
Jacky Lee <qcsd2011 at gmail dot com>: The major contributor of this project from Baidu
Xiduo You <ulyssesyou at apache dot org >: The major contributor of this project from NetEase, Apache Spark Committer.
Keyong Zhou <zky.zhoukeyong at alibaba-inc dot com > : The major contributor of this project from Alibaba, Apache Celeborn(incubating) Committer & PPMC Member.
Chuan Yang <yangchuan.zy at alibaba-inc dot com> : The major contributor of this project from Alibaba.

Alignment:

Gluten is constructed using Apache Spark and incorporates several other Apache projects, including Hadoop and YARN. The codebase of Gluten is already licensed under Apache License Version 2.0. Moreover, our team includes core developers with significant experience contributing to diverse Apache projects. Leveraging these community connections, we prioritize development practices that emphasize community engagement, aligning ourselves with the Apache Software Foundation's path to meritocratic recognition seamlessly.

Known Risks

Project Name

“Gluten” is Latin for glue. Main goal of project Gluten is to “glue" the SparkSQL and native libraries. So we can take use of and benefit from the high scalability of Spark SQL framework, as well as the high performance of native libraries.

Orphaned products

There is a certain level of risk associated with the potential abandonment of the Gluten project, particularly given its status as a young and relatively small community. It is imperative that we address and mitigate this risk promptly during the Apache Incubation phase. Numerous organizations rely on Gluten to construct vital big data pipelines, making it crucial to engage and encourage their involvement in nurturing the Gluten community, especially if it transitions into an Apache Software Foundation (ASF) project.

Inexperience with Open Source

Numerous Gluten contributors possess extensive experience in collaborating on open-source projects. Additionally, they actively contribute and serve as committers to various other Apache projects.

Homogenous Developers

The present contributors are affiliated with diverse organizations such as Intel, Kyligence, and more. We remain dedicated to recruiting additional committers based on their significant contributions to the project. The Gluten project is inherently polyglot, supporting development in a diverse array of languages such as Scala, Java, C++, Python, and Shell Script. This versatility appeals to developers with a broad spectrum of language skills, encouraging their active contributions to the Gluten project.

Reliance on Salaried Developers

Salaried engineers from companies such as Intel and Kyligence have made valuable contributions to the Gluten project, dedicating both their salaried work hours and volunteer time. Their enthusiasm for the project is palpable, and we remain steadfast in our commitment to expanding our team, welcoming more members from various backgrounds, including non-salaried developers. Our goal is to foster a more diverse Gluten user and contributor base as we move forward.

Relationships with Other Apache Products

Apache Spark (https://spark.apache.org/): Gluten's endorsement of Spark as its primary big data framework of choice stems from Spark's reputation as a potent, open-source distributed computing framework, integral to the core of big data analytics.
Apache Arrow (https://arrow.apache.org/): Gluten utilizes Apache Arrow as a data format to empower high-performance data interchange across diverse programming languages, frameworks, and backends.
Apache Celeborn(incubating) (https://celeborn.apache.org/): Gluten is closely integrated with Apache Celeborn for remote shuffle service support. The design goal of integrating Gluten with Celeborn is to simultaneously preserve the core designs of Gluten Columnar Shuffle and Celeborn Remote Shuffle, allowing the advantages of both to be combined.
Apache Uniffle(incubating) (https://uniffle.apache.org/): Uniffle, a project offering high performance remote shuffle service capabilities, represents another promising integration opportunity that Gluten is considering. Gluten will be supported in the Apache Uniffle v0.8 release.
Apache Flink(https://flink.apache.org/): Apache Flink emerges as another promising big data framework that Gluten aims to incorporate as an intermediary layer, facilitating the seamless offloading of data processing to the native engine.

An Excessive Fascination with the Apache Brand

The main objective behind submitting Gluten to the ASF is to cultivate a robust and diverse community while fostering stability for sustainable development. Additionally, we aspire to promote the widespread adoption of Gluten by diverse organizations, encouraging their contributions without any apprehensions regarding ownership or licensing.

Documentation

Documentations can be found on:

Gluten Doc Website (https://oap-project.github.io/gluten)
Gluten GitHub README (Gluten Doc Website (https://oap-project.github.io/gluten)
Gluten v0.5.0 release (https://github.com/oap-project/gluten/releases/tag/0.5.0)
Gluten v1.0.0 release (https://github.com/oap-project/gluten/releases/tag/v1.0.0)
Gluten v1.1.0 releaes (https://github.com/oap-project/gluten/releases/tag/v1.1.0)

You can find the specific version of Gluten documentation listed below:

main
branch-0.5.0
branch-1.0
branch-1.1

Initial Source

Gluten Source Code (https://github.com/oap-project/gluten)

Initial Source and Intellectual Property Submission Plan

Upon Gluten's approval to join the Apache Incubator, our initial committers will promptly submit their SGA, CCLA(s), iCLA. Rest assured, the codebase is already licensed under the Apache License 2.0, ensuring compliance and seamless integration.

External Dependencies

The list is very long so put it in the "table 1" at the end of this page.

Cryptography

Gluten does not currently include any cryptography-related code.

Required Resources

Mailing lists:

private@gluten.incubator.apache.org (PPMC)
dev@gluten.incubator.apache.org (dev mailing list)
commits@gluten.incubator.apache.org

Git Repositories:

Upon entering incubation, we want to move the existing repo to the Apache Software Foundation:

Issue Tracking:

We request the creation of an Apache-hosted JIRA.
Jira ID: GLUTEN

Initial Committers

Binwei Yang <binwei.yang at intel dot com>
Weiting Chen <weiting.chen at intel dot com >
Yuan Zhou <yuan.zhou at intel dot com >
Rui Mo <rui.mo at intel dot com >
Hongze Zhang <Hongze.Zhang at intel dot com >
Jia Ke <ke.a.jia at intel dot com >
Feilong He <Feilong.He at intel dot com >
Marin Ma <rong.ma at intel dot com >
Chang Chen <chang.chen at kyligence dot io>
Hongbin Ma <mahongbin at apache dot org>
Zhichao Zhang<zhangzc at apache dot org>
Neng Liu <neng.liu at kyligence dot io>
Shuai Li <shuai.li at kyligence dot io>
Yang Li <liyang910910 at gmail dot com>
Jiabiao Liang <lgbo.ustc at gmail dot com>
Zhibiao Zhang < zhanglinuxstar at gmail dot com>
zuochunwei <zuochunwei at meituan dot com >
kecookier <zhaokuo_game at 163 dot com >
zhli1142015 <zhli at microsoft dot com >
Jacky Lee <qcsd2011 at gmail dot com>
Xiduo You <ulyssesyou at apache dot org>
Keyong Zhou <zky.zhoukeyong at alibaba-inc dot com >
Chuan Yang <yangchuan.zy at alibaba-inc dot com>

Table 1: External dependencies

Apache 1.1

oro:oro:jar:2.0.8

Apache2.0

Totally 237 dependencies, as listed below:

cglib:cglib-nodep:jar:2.1_3
com.clearspring.analytics:stream:jar:2.9.6
com.fasterxml.jackson.core:jackson-annotations:jar:2.13.5
com.fasterxml.jackson.core:jackson-core:jar:2.13.5
com.fasterxml.jackson.core:jackson-databind:jar:2.13.5
com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:jar:2.13.4:runtime
com.fasterxml.jackson.datatype:jackson-datatype-jdk8:jar:2.13.4:runtime
com.fasterxml.jackson.module:jackson-module-scala_2.12:jar:2.13.5
com.github.ben-manes.caffeine:caffeine:jar:2.9.3
com.github.joshelser:dropwizard-metrics-hadoop-metrics2-reporter:jar:0.1.2
com.google.code.findbugs:jsr305:jar:3.0.0
com.google.code.findbugs:jsr305:jar:3.0.0
com.google.code.findbugs:jsr305:jar:3.0.0:runtime
com.google.code.gson:gson:jar:2.2.4
com.google.code.gson:gson:jar:2.8.6
com.google.code.gson:gson:jar:2.8.9
com.google.crypto.tink:tink:jar:1.6.0
com.google.errorprone:error_prone_annotations:jar:2.10.0
com.google.guava:guava:jar:11.0.2
com.google.guava:guava:jar:26.0-jre
com.google.guava:guava:jar:32.0.1-android
com.google.j2objc:j2objc-annotations:jar:2.8
com.jolbox:bonecp:jar:0.8.0.RELEASE
com.madgag:animated-gif-lib:jar:1.4
com.ning:compress-lzf:jar:1.0.3
com.tdunning:json:jar:1.8
com.twitter:chill_2.12:jar:0.10.0
com.twitter:chill-java:jar:0.10.0
com.univocity:univocity-parsers:jar:2.9.1
com.zaxxer:HikariCP:jar:2.5.1
commons-beanutils:commons-beanutils:jar:1.7.0
commons-beanutils:commons-beanutils-core:jar:1.8.0
commons-cli:commons-cli:jar:1.2
commons-codec:commons-codec:jar:1.15
commons-codec:commons-codec:jar:1.4
commons-collections:commons-collections:jar:3.2.2
commons-configuration:commons-configuration:jar:1.6
commons-dbcp:commons-dbcp:jar:1.4
commons-digester:commons-digester:jar:1.8
commons-httpclient:commons-httpclient:jar:3.1
commons-io:commons-io:jar:2.11.0
commons-io:commons-io:jar:2.4
commons-io:commons-io:jar:2.8.0
commons-lang:commons-lang:jar:2.6
commons-logging:commons-logging:jar:1.1.3
commons-logging:commons-logging:jar:1.2
commons-net:commons-net:jar:3.1
commons-pool:commons-pool:jar:1.5.4
de.rototor.pdfbox:graphics2d:jar:0.27
io.airlift:aircompressor:jar:0.21
io.dropwizard.metrics:metrics-core:jar:4.2.0
io.dropwizard.metrics:metrics-graphite:jar:4.2.0
io.dropwizard.metrics:metrics-jmx:jar:4.2.0
io.dropwizard.metrics:metrics-json:jar:4.2.0
io.dropwizard.metrics:metrics-jvm:jar:4.2.0
io.jsonwebtoken:jjwt-api:jar:0.10.5
io.jsonwebtoken:jjwt-impl:jar:0.10.5
io.jsonwebtoken:jjwt-jackson:jar:0.10.5
io.netty:netty-all:jar:4.0.23.Final
io.netty:netty-all:jar:4.1.68.Final
io.substrait:core:jar:0.5.0
io.trino.tpcds:tpcds:jar:1.4
io.trino.tpch:tpch:jar:1.1
jakarta.validation:jakarta.validation-api:jar:2.0.2
javax.inject:javax.inject:jar:1
javax.jdo:jdo-api:jar:3.0.1
javax.xml.stream:stax-api:jar:1.0-2
joda-time:joda-time:jar:2.10.10
log4j:log4j:jar:1.2.17
net.bytebuddy:byte-buddy:jar:1.9.3
net.bytebuddy:byte-buddy-agent:jar:1.9.3
net.sf.opencsv:opencsv:jar:2.3
net.sourceforge.cssparser:cssparser:jar:0.9.16
net.sourceforge.htmlunit:htmlunit:jar:2.18
net.sourceforge.htmlunit:htmlunit-core-js:jar:2.17
net.sourceforge.nekohtml:nekohtml:jar:1.9.22
org.apache.avro:avro:jar:1.10.2
org.apache.avro:avro:jar:1.7.4
org.apache.avro:avro-ipc:jar:1.10.2
org.apache.avro:avro-mapred:jar:1.10.2
org.apache.commons:commons-compress:jar:1.20
org.apache.commons:commons-compress:jar:1.4.1
org.apache.commons:commons-compress:jar:1.9
org.apache.commons:commons-crypto:jar:1.1.0
org.apache.commons:commons-exec:jar:1.3
org.apache.commons:commons-lang3:jar:3.12.0
org.apache.commons:commons-math3:jar:3.1.1
org.apache.commons:commons-math3:jar:3.4.1
org.apache.commons:commons-text:jar:1.6
org.apache.curator:curator-client:jar:2.7.1
org.apache.curator:curator-framework:jar:2.7.1
org.apache.curator:curator-recipes:jar:2.7.1
org.apache.derby:derby:jar:10.14.2.0
org.apache.directory.api:api-asn1-api:jar:1.0.0-M20
org.apache.directory.api:api-util:jar:1.0.0-M20
org.apache.directory.server:apacheds-i18n:jar:2.0.0-M15
org.apache.directory.server:apacheds-kerberos-codec:jar:2.0.0-M15
org.apache.hadoop:hadoop-annotations:jar:2.7.4
org.apache.hadoop:hadoop-auth:jar:2.7.4
org.apache.hadoop:hadoop-client:jar:2.7.4
org.apache.hadoop:hadoop-common:jar:2.7.4
org.apache.hadoop:hadoop-hdfs:jar:2.7.4
org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.7.4
org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.7.4
org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.7.4
org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.7.4
org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.7.4
org.apache.hadoop:hadoop-yarn-api:jar:2.7.4
org.apache.hadoop:hadoop-yarn-client:jar:2.7.4
org.apache.hadoop:hadoop-yarn-common:jar:2.7.4
org.apache.hadoop:hadoop-yarn-server-common:jar:2.7.4
org.apache.hive.shims:hive-shims-0.23:jar:2.3.9
org.apache.hive.shims:hive-shims-common:jar:2.3.9
org.apache.hive.shims:hive-shims-scheduler:jar:2.3.9
org.apache.hive:hive-common:jar:2.3.9
org.apache.hive:hive-exec:jar:core:2.3.9
org.apache.hive:hive-llap-client:jar:2.3.9
org.apache.hive:hive-llap-common:jar:2.3.9
org.apache.hive:hive-metastore:jar:2.3.9
org.apache.hive:hive-serde:jar:2.3.9
org.apache.hive:hive-shims:jar:2.3.9
org.apache.hive:hive-storage-api:jar:2.7.2
org.apache.hive:hive-vector-code-gen:jar:2.3.9
org.apache.htrace:htrace-core:jar:3.1.0-incubating
org.apache.httpcomponents:httpclient:jar:4.2.5
org.apache.httpcomponents:httpclient:jar:4.5.13
org.apache.httpcomponents:httpcore:jar:4.2.4
org.apache.httpcomponents:httpcore:jar:4.4.13
org.apache.httpcomponents:httpmime:jar:4.5
org.apache.ivy:ivy:jar:2.5.0
org.apache.orc:orc-core:jar:1.6.14
org.apache.orc:orc-mapreduce:jar:1.6.14
org.apache.orc:orc-shims:jar:1.6.14
org.apache.parquet:parquet-column:jar:1.12.2
org.apache.parquet:parquet-common:jar:1.12.2
org.apache.parquet:parquet-encoding:jar:1.12.2
org.apache.parquet:parquet-format-structures:jar:1.12.2
org.apache.parquet:parquet-hadoop:jar:1.12.2
org.apache.parquet:parquet-jackson:jar:1.12.2
org.apache.pdfbox:fontbox:jar:2.0.19
org.apache.pdfbox:pdfbox:jar:2.0.19
org.apache.spark:spark-catalyst_2.12:jar:3.2.2
org.apache.spark:spark-catalyst_2.12-jars:3.2.2
org.apache.spark:spark-core_2.12:jar:3.2.2
org.apache.spark:spark-core_2.12-jars:3.2.2
org.apache.spark:spark-hive_2.12:jar:3.2.2
org.apache.spark:spark-kvstore_2.12:jar:3.2.2
org.apache.spark:spark-launcher_2.12:jar:3.2.2
org.apache.spark:spark-network-common_2.12:jar:3.2.2
org.apache.spark:spark-network-shuffle_2.12:jar:3.2.2
org.apache.spark:spark-sketch_2.12:jar:3.2.2
org.apache.spark:spark-sql_2.12:jar:3.2.2
org.apache.spark:spark-sql_2.12-jars:3.2.2
org.apache.spark:spark-tags_2.12:jar:3.2.2
org.apache.spark:spark-unsafe_2.12:jar:3.2.2
org.apache.thrift:libfb303:jar:0.9.3
org.apache.thrift:libthrift:jar:0.12.0
org.apache.velocity:velocity:jar:1.5
org.apache.xbean:xbean-asm9-shaded:jar:4.20
org.apache.yetus:audience-annotations:jar:0.5.0
org.apache.zookeeper:zookeeper:jar:3.4.6
org.apache.zookeeper:zookeeper:jar:3.6.2
org.apache.zookeeper:zookeeper-jute:jar:3.6.2
org.codehaus.jackson:jackson-core-asl:jar:1.9.13
org.codehaus.jackson:jackson-jaxrs:jar:1.9.13
org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13
org.codehaus.jackson:jackson-xc:jar:1.9.13
org.datanucleus:datanucleus-api-jdo:jar:4.2.4
org.datanucleus:datanucleus-core:jar:4.1.17
org.datanucleus:datanucleus-rdbms:jar:4.1.19
org.datanucleus:javax.jdo:jar:3.2.0-m3
org.eclipse.jetty.websocket:websocket-api:jar:9.2.12.v20150709
org.eclipse.jetty.websocket:websocket-client:jar:9.2.12.v20150709
org.eclipse.jetty.websocket:websocket-common:jar:9.2.12.v20150709
org.eclipse.jetty:jetty-io:jar:9.2.12.v20150709
org.eclipse.jetty:jetty-util:jar:9.2.12.v20150709
org.glassfish.hk2.external:aopalliance-repackaged:jar:2.6.1
org.glassfish.hk2.external:jakarta.inject:jar:2.6.1
org.glassfish.hk2:hk2-api:jar:2.6.1
org.glassfish.hk2:hk2-locator:jar:2.6.1
org.glassfish.hk2:hk2-utils:jar:2.6.1
org.glassfish.hk2:osgi-resource-locator:jar:1.0.3
org.glassfish.jersey.inject:jersey-hk2:jar:2.34
org.jetbrains:annotations:jar:17.0.0
org.json4s:json4s-ast_2.12:jar:3.7.0-M11
org.json4s:json4s-core_2.12:jar:3.7.0-M11
org.json4s:json4s-jackson_2.12:jar:3.7.0-M11
org.json4s:json4s-scalap_2.12:jar:3.7.0-M11
org.knowm.xchart:xchart:jar:3.6.5
org.lz4:lz4-java:jar:1.7.1
org.mortbay.jetty:jetty-sslengine:jar:6.1.26
org.mortbay.jetty:jetty-util:jar:6.1.26
org.objenesis:objenesis:jar:2.5.1
org.objenesis:objenesis:jar:2.6
org.roaringbitmap:RoaringBitmap:jar:0.9.0
org.roaringbitmap:shims:jar:0.9.0
org.scalactic:scalactic_2.12:jar:3.2.3
org.scala-lang.modules:scala-parser-combinators_2.12:jar:1.1.2
org.scala-lang.modules:scala-xml_2.12:jar:1.2.0
org.scala-lang:scala-library:jar:2.12.12
org.scala-lang:scala-library:jar:2.12.15
org.scala-lang:scala-reflect:jar:2.12.12
org.scala-lang:scala-reflect:jar:2.12.15
org.scalatest:scalatest_2.12:jar:3.2.3
org.scalatest:scalatest-compatible:jar:3.2.3
org.scalatest:scalatest-core_2.12:jar:3.2.3
org.scalatest:scalatest-diagrams_2.12:jar:3.2.3
org.scalatest:scalatest-featurespec_2.12:jar:3.2.3
org.scalatest:scalatest-flatspec_2.12:jar:3.2.3
org.scalatest:scalatest-freespec_2.12:jar:3.2.3
org.scalatest:scalatest-funspec_2.12:jar:3.2.3
org.scalatest:scalatest-funsuite_2.12:jar:3.2.3
org.scalatest:scalatest-matchers-core_2.12:jar:3.2.3
org.scalatest:scalatest-mustmatchers_2.12:jar:3.2.3
org.scalatest:scalatest-propspec_2.12:jar:3.2.3
org.scalatest:scalatest-refspec_2.12:jar:3.2.3
org.scalatest:scalatest-shouldmatchers_2.12:jar:3.2.3
org.scalatest:scalatest-wordspec_2.12:jar:3.2.3
org.scalatestplus:scalatestplus-mockito_2.12:jar:1.0.0-M2
org.scalatestplus:scalatestplus-scalacheck_2.12:jar:3.1.0.0-RC2
org.seleniumhq.selenium:selenium-api:jar:2.52.0
org.seleniumhq.selenium:selenium-htmlunit-driver:jar:2.52.0
org.seleniumhq.selenium:selenium-remote-driver:jar:2.52.0
org.seleniumhq.selenium:selenium-support:jar:2.52.0
org.slf4j:jcl-over-slf4j:jar:1.7.30
org.slf4j:jul-to-slf4j:jar:1.7.30
org.slf4j:slf4j-api:jar:1.7.30
org.slf4j:slf4j-log4j12:jar:1.7.30
org.spark-project.spark:unused:jar:1.0.0
org.xerial.snappy:snappy-java:jar:1.0.4.1
org.xerial.snappy:snappy-java:jar:1.1.8.4
org.yaml:snakeyaml:jar:1.31:runtime
stax:stax-api:jar:1.0.1
xalan:serializer:jar:2.7.2
xalan:xalan:jar:2.7.2
xerces:xercesImpl:jar:2.9.1
xml-apis:xml-apis:jar:1.3.04

Apache 2.0 with dual license

With Apache-2.0, BSD-2-Clause, BSD-3-Clause, EDL-1.0, EPL-2.0, GPL-2.0-with-classpath-exception, MIT, Public-Domain, W3C:

org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.34
org.glassfish.jersey.core:jersey-client:jar:2.34
org.glassfish.jersey.core:jersey-common:jar:2.34
org.glassfish.jersey.core:jersey-server:jar:2.34
org.glassfish.jersey.containers:jersey-container-servlet:jar:2.34

With Apache-2.0 and GPL-2.0

org.rocksdb:rocksdbjni:jar:6.20.3
net.java.dev.jna:jna:jar:4.1.0
net.java.dev.jna:jna-platform:jar:4.1.0
org.javassist:javassist:jar:3.25.0-GA

BSD-2-Clause

com.github.luben:zstd-jni:jar:1.5.0-4
javolution:javolution:jar:5.5.1
jline:jline:jar:2.12
org.jodd:jodd-core:jar:3.5.2

BSD-3-Clause

com.esotericsoftware:kryo-shaded:jar:4.0.2
com.esotericsoftware:minlog:jar:1.3.0
com.google.protobuf:protobuf-java:jar:2.5.0
com.google.protobuf:protobuf-java:jar:3.23.4
com.thoughtworks.paranamer:paranamer:jar:2.3
com.thoughtworks.paranamer:paranamer:jar:2.8
io.glutenproject:protobuf-java:jar:3.23.4-0
io.glutenproject:protobuf-java-util:jar:3.23.4-0
net.sf.py4j:py4j:jar:0.10.9.5
org.abego.treelayout:org.abego.treelayout.core:jar:1.0.3
org.antlr:antlr4:jar:4.9.2
org.antlr:antlr4-runtime:jar:4.8
org.antlr:antlr4-runtime:jar:4.8
org.antlr:antlr-runtime:jar:3.5.2
org.antlr:antlr-runtime:jar:3.5.2
org.antlr:ST4:jar:4.0.4
org.antlr:ST4:jar:4.3
org.codehaus.janino:commons-compiler:jar:3.0.16
org.codehaus.janino:janino:jar:3.0.16
org.fusesource.leveldbjni:leveldbjni-all:jar:1.8
org.hamcrest:hamcrest-core:jar:1.3
org.scalacheck:scalacheck_2.12:jar:1.13.5
org.scala-sbt-interface:jar:1.0
org.threeten:threeten-extra:jar:1.5.0

CDDL-1.1

javax.activation:activation:jar:1.1
javax.activation:activation:jar:1.1.1
javax.servlet:servlet-api:jar:2.5
javax.transaction:jta:jar:1.1
javax.transaction:transaction-api:jar:1.1

CDDL-1.1 with dual license

With CDDL-1.1 and GPL-2.0:

com.sun.jersey:jersey-client:jar:1.9
com.sun.jersey:jersey-core:jar:1.9
javax.servlet.jsp:jsp-api:jar:2.1
javax.xml.bind:jaxb-api:jar:2.2.2
org.glassfish:javax.json:jar:1.0.4
javax.xml.bind:jaxb-api:jar:2.2.11

EPL-1.0

junit:junit:jar:4.13.1

EPL-2.0 with dual license

With EPL-2.0 and GPL-2.0-with-classpath-exception:

jakarta.annotation:jakarta.annotation-api:jar:1.3.5
jakarta.servlet:jakarta.servlet-api:jar:4.0.3
jakarta.ws.rs:jakarta.ws.rs-api:jar:2.1.6

ICU

com.ibm.icu:icu4j:jar:61.1

MIT

net.razorvine:pyrolite:jar:4.30
org.checkerframework:checker-qual:jar:3.19.0
org.codehaus.mojo:animal-sniffer-annotations:jar:1.14
org.kohsuke:github-api:jar:1.117
org.mockito:mockito-core:jar:2.23.4
xmlenc:xmlenc:jar:0.52

Public Domain

org.tukaani:xz:jar:1.0
org.tukaani:xz:jar:1.8

W3C

org.w3c.css:sac:jar:1.3

Page tree