Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Update the abstract, and the length of incubation.

Abstract

Gluten is a middle layer responsible for offloading Apache Spark SQL queries JVM-based SQL engines' execution to native engines. This project aims to address the CPU computational bottleneck to offload Apache Spark SQL JVM operators to native engines in data loading and various scenarios based on Apache Spark. With advancements in IO technologies, especially the widespread use of SSDs and 10GbE NICs or higher bandwidth, CPU computation has gradually become the primary limiting factor for performance. However, optimizing CPU instructions based on the JVM is relatively challenging compared to other native languages like C++, as the JVM provides fewer optimization capabilities. At this moment, Apache Spark is the first engine it can plug into. Support for other engines like Trino, Apache Flink are on the roadmap.

Proposal

The Gluten project utilizes JVM-based SQL engines' (like Apache Spark's ) plugin mechanism to intercept and send query plans to native engines for execution, bypassing Apache Sparkthe original engine's less efficient execution path. The project supports multiple native engines as backends, including Velox, ClickHouse, and Apache Arrow. For operations that the native engines cannot handle, Gluten falls back to Sparkthe SQL engine's normal execution path. In terms of thread models, Gluten utilizes JNI (Java Native Interface) library calls to invoke native code directly within Spark original engine's executor task threads, avoiding the introduction of complex thread models.

...

However, there is a need to address query performance more broadly. The industry understands the current performance bottleneck. This motivated Intel and Kyligence to initiate the Gluten project to unleash the power of Advanced Vector Extensions (AVX) technology using SIMD instructions within a vectorized SQL engine, which enables Apache Spark (as well as other engines in the future) to break through its row-based data processing and JVM limitations. 

...

Expect to enter incubation in two months and graduate in 12 18 months.

Homogenous Developers

...