2. Motivation

Apache Zeppelin provides valuable features for table manipulations such as built-in visualizations, pivoting and CSV download. However, these features are limited from the table size perspective. Currently, they are executed on the browser side and the table size is limited (configurable and 1000 rows by default). Thus moving these computations from in-browser to backend will be a starting point for handling large data and improving pivoting, filtering, full CSV download, pagination, and other functionalities.

Furthermore, the tables across interpreter processes currently can’t be shared. For example, table from JDBC interpreter wouldn’t be accessible from SparkSQL or Python interpreters. So the idea here is to extend existing Zeppelin resource pool to share Table resources across interpreters. It would allow also to have one central Table menu to access and view table information of registered Table resources.

...

For interpreters which use SQL

provide an interpreter option: create TableData whenever executing a paragraph
or provide new interpreter magic for it: %spark.sql_share, %jdbc.mysql_share, …
or automatically put all table results into the resource pool if they are not heavy (e.g keeping query only, or just reference for RDD)
If interpreter supports runtime interpreterparameters, we can use this syntax: %jdbc(share=true) to specify whether share the table result or not

For interpreters which use programming language (e.g python)

provide API like z.put()

Code Block

language	scala
linenumbers	true

// infer instance type and convert it to predefined the `TableData` subclass such as `SparkDataFrameTableData`
z.put (“myTable01”, myDataFrame01)

// or force user to put the `TableData` subclass
val myTableData01 = new SparkRDDTableData(myRdd01)
z.put(“myTable01”, myTableData01)

...

The issues we discussed above can be implemented in this sequence.the following order of priority

ZEPPELIN-TBD: Adding pivot, filter methods to TableData
ZEPPELIN-TBD: ResourceRegistry
ZEPPELIN-TBD: Rest API for resource pool
ZEPPELIN-TBD: UI for Table page
ZEPPELIN-TBD: Apply pivot, filter methods for built-in visualizations
ZEPPELIN-TBD: SparkTableData, SparkSQLTableData, JDBCTableData, etc.
ZEPPELN-2029: ACL for ResourcePool
ZEPPELIN-2022: Zeppelin resource pool as a Spark Data Source

...

Page tree

Versions Compared

Old Version 32

New Version Current

Key

2. Motivation

Page tree

Page History

Versions Compared

Old Version 32

New Version Current

Key

2. Motivation