Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Gluten is a middle layer responsible for offloading Apache Spark SQL queries to native engines. This project aims to address the CPU computational bottleneck to offload Apache Spark SQL operators to native engines in data loading scenarios based on Apache Spark. With advancements in IO technologies, especially the widespread use of SSDs and 10GbE NICs or higher bandwidth, CPU computation has gradually become the primary limiting factor for performance. However, optimizing CPU instructions based on the JVM is relatively challenging compared to other native languages like C++, as the JVM provides fewer optimization capabilities.

...

The Gluten project utilizes Apache Spark's plugin mechanism to intercept and send query plans to native engines for execution, bypassing Apache Spark's less efficient execution path. The project supports multiple native engines as backends, including Velox, ClickHouse, and Apache Arrow. For operations that the native engines cannot handle, Gluten falls back to Spark's normal execution path. In terms of thread models, Gluten utilizes JNI (Java Native Interface) library calls to directly invoke native code within Spark executor task threads, avoiding the introduction of complex thread models.

...

  • Binwei Yang <binwei.yang at intel dot com>: The major contributor of this project from Intel 
  • Weiting Chen <weiting.chen at intel dot com >: The major contributor of this project from Intel 
  • Yuan Zhou <yuan.zhou at intel dot com >: The major contributor of this project from Intel 
  • Rui Mo <rui.mo at intel dot com >: The major contributor of this project from Intel
  • Hongze Zhang <Hongze.Zhang at intel dot com >: The major contributor of this project from Intel
  • Jia Ke <ke.a.jia at intel dot com >: The major contributor of this project from Intel
  • Feilong He <Feilong.He at intel dot com >: The major contributor of this project from Intel
  • Marin Ma <rong.ma at intel dot com >: The major contributor of this project from Intel
  • Chang Chen <chang.chen at kyligence dot io> : The major contributor of this project from Kyligence
  • Hongbin Ma <mahongbin at apache dot org> : The major contributor of this project from Kyligence, Apache Kylin committer & PMC member
  • Zhichao Zhang<zhangzc at apache dot org> : The major contributor of this project from Kyligence, Apache Kylin committer, Apache CarbonData committer & PMC member
  • Neng Liu <neng.liu at kyligence dot io> : The major contributor of this project from Kyligence
  • Shuai Li <shuai.li at kyligence dot io> : The major contributor of this project from Kyligence
  • Yang Li <liyang910910 at gmail dot com> : The major contributor of this project from BIGO
  • Jiabiao Liang <lgbo.ustc at gmail dot com> : The major contributor of this project from BIGO
  • Zhibiao Zhang < zhanglinuxstar at gmail dot com> : The major contributor of this project from BIGO
  • zuochunwei <zuochunwei at meituan dot com >: The major contributor of this project from Meituan
  • kecookier <zhaokuo_game at 163 dot com >: The major contributor of this project from Meituan
  • zhli1142015 <zhli at microsoft dot com >: The major contributor of this project from Microsoft
    Jacky Lee <qcsd2011 at gmail dot com>: The major contributor of this project from Baidu
  • Xiduo You <ulyssesyou at apache dot org >: The major contributor of this project from NetEase, Apache Spark Committer.
  • Keyong Zhou <zky.zhoukeyong at alibaba-inc dot com > :  The major contributor of this project from Alibaba, Apache Celeborn(incubating) Committer.
  • Chuan Yang <yangchuan.zy at alibaba-inc dot com> : The major contributor of this project from Alibaba.

...

 Relationships with Other Apache Products:

  • Apache Spark(https://spark.apache.org/): Gluten's endorsement of Spark as its primary big data framework of choice stems from Spark's reputation as a potent, open-source distributed computing framework, integral to the core of big data analytics.

 

  • Apache Arrow(https://githubarrow.com/apache.org/arrow): Gluten utilizes Apache Arrow as a data format to empower high-performance data interchange across diverse programming languages, frameworks, and backends.

 

  • Apache Celeborn(incubating) (https://githubceleborn.com/apache.org/incubator-celeborn): Gluten is closely integrated with Apache Celeborn for remote shuffle service support. The design goal of integrating Gluten with Celeborn is to simultaneously preserve the core designs of Gluten Columnar Shuffle and Celeborn Remote Shuffle, allowing the advantages of both to be combined.

 

  • Apache Uniffle(incubating) (https://githubuniffle.com/apache.org/incubator-uniffle): Uniffle, a project offering high performance remote shuffle service capabilities, represents another promising integration opportunity that Gluten is considering. Gluten will be supported in the Apache Uniffle v0.8 release.

 

  • Apache Flink(https://githubflink.com/apache.org/flink): Apache Flink emerges as another promising big data framework that Gluten aims to incorporate as an intermediary layer, facilitating the seamless offloading of data processing to the native engine.

...

  • main
  • branch-0.5.0
  • branch-1.0
  • branch-1.1 

Initial Source

Gluten Source Code (https://github.com/oap-project/gluten)

...

  • Binwei Yang <binwei.yang at intel dot com>
  • Weiting Chen <weiting.chen at intel dot com >
  • Yuan Zhou <yuan.zhou at intel dot com >
  • Rui Mo <rui.mo at intel dot com >
  • Hongze Zhang <Hongze.Zhang at intel dot com >
  • Jia Ke <ke.a.jia at intel dot com >
  • Feilong He <Feilong.He at intel dot com >
  • Marin Ma <rong.ma at intel dot com >
  • Chang Chen <chang.chen at kyligence dot io>
  • Hongbin Ma <mahongbin at apache dot org>
  • Zhichao Zhang<zhangzc at apache dot org>
  • Neng Liu <neng.liu at kyligence dot io>
  • Shuai Li <shuai.li at kyligence dot io>
  • Yang Li <liyang910910 at gmail dot com>
  • Jiabiao Liang <lgbo.ustc at gmail dot com>
  • Zhibiao Zhang < zhanglinuxstar at gmail dot com>
  • zuochunwei <zuochunwei at meituan dot com >
  • kecookier <zhaokuo_game at 163 dot com >
  • zhli1142015 <zhli at microsoft dot com >
  • Jacky Lee <qcsd2011 at gmail dot com>
  • Xiduo You <ulyssesyou at apache dot org>
  • Keyong Zhou <zky.zhoukeyong at alibaba-inc dot com >
  • Chuan Yang <yangchuan.zy at alibaba-inc dot com> 

...