Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Discussion Thread: [...]

JIRA: ZEPPELIN-2019

2. Motivation

Apache Zeppelin provides valuable features for table manipulations such as built-in visualizations, pivoting and CSV download. However, these features are limited from the table size perspective. Currently, they are executed on the browser side and the table size is limited (configurable and 1000 rows by default). Thus moving these computations from in-browser to backend will be a starting point for handling large data and improving pivoting, filtering, full csv CSV download, pagination, and other functionalities.

...

For more future work tasks, please refer the 6. Potential Future Work section.


3. Proposed Changes

3.1. Overview: Sharing a table resource between different interpreters

This diagram shows how Spark Interpreter can query the table generated from JDBC interpreter.

  1. An interpreter (`A`) a newly created table result can be registered as a resource.

  2. Since every resource registred in a resource pool in an interpreter can be searched via `DisbitrubedResourcePool` and supports remote method invocation, other interpreters (`B`) can use it.

  3. Let’s say JDBCInterpreter created a table result and keep it (`JDBCTableData`) into its resource pool.

  4. Then, SparkInterpreter can fetch rows, columns via remote method invocation. if Zeppelin registers the distributed resource pool as Spark Data Source, SparkInterpreter can use all table resources in Zeppelin smoothly. (e.g Querying the table in SparkSQL as like a normal table)

 

Gliffy Diagram
nameoverview1

3.2. Overview: How an interpreter can handle table resources

Here are is a more detailed view to explain how one interpreter can handle its `TableData` implementation with the resource pool.