...

A newly created table result can be registered as a resource in an interpreter.
Since every resource registered in a resource pool in an interpreter can be searched via `DisbitrubedResourcePool` and supports remote method invocation, other interpreters can use it.
Let’s say JDBCInterpreter created a table result and keep it (JDBCTableData) into its resource pool.
Then, SparkInterpreter can fetch rows, columns via remote method invocation. if Zeppelin registers the distributed resource pool as Spark Data Source, SparkInterpreter can use all table resources in Zeppelin smoothly. (e.g Querying the table in SparkSQL as like a normal table)

Gliffy Diagram


size	1200
name	overview1

...

Here are is a more detailed view to explain how one interpreter can handle its TableData implementation with the resource pool.

Gliffy Diagram


name	overview2

4. Public Interfaces

4.1. Interfaces for TableData related classes

...

class	How it can get table data
InterpreterTableDataResult	Contains actual data in memory
Interpreter specific TableData (e.g SparkTableData, SparkSQLTableData, …)	Knows how to reproduce the original table data. (e.g keep the query in case of JDBC, SparkSQL)

...

Gliffy Diagram


name	tabledata-class

4.1.1. Additional methods for TableData

...

Code Block

language	java
theme	Eclipse
linenumbers	true

public interface TableData {
 
    …
    /**
     * filter the input `TableData` based on columns.
     */
    public TableData filter(List<String> columnNames);

    /**
     * Pivot the input `TableData` for visualizations 
     */
    public TableData pivot(List<String> keyColumns,
                           List<String> groupColumns, 
                           List<String> valueColumns);
 

    …
}

...

SparkInterpreter can have SparkTableData which

points RDD to get the table result
filter and pivot can be written by using Spark RDD APIs

JDBCInterpreter can have JDBCTableData which

keeps query to reproduce the table result
filter and pivot can be written using a query that has additional `where` and `group by` statements.

...

Some interpreters (e.g ShellInterpreter) might not be connected with external storage. In this case, those interpreters can use the InterpreterResultTableData class.

...

Code Block

language	java
theme	Eclipse
linenumbers	true

public class DefaultSource implements RelationProvider, SchemaRelationProvider {


 Logger logger = LoggerFactory.getLogger(DefaultSource.class);
 public static ResourcePool resourcePool;

 public DefaultSource() {
 }

 @Override
 public BaseRelation createRelation(SQLContext sqlContext, Map<String, String> parameters) {
   return createRelation(sqlContext, parameters, null);
 }


 @Override
 public BaseRelation createRelation(
     SQLContext sqlContext,
     Map<String, String> parameters,
     StructType schema) {

   String path = parameters.get("path").get();
   String [] noteIdAndParagraphId = path.split("\\|");

   ResourceSet rs = ResourcePoolUtils.getAllResources();

   Resource resource = resourcePool.get(
       noteIdAndParagraphId[0],
       noteIdAndParagraphId[1],
       WellKnownResourceName.ZeppelinTableResult.toString());


   InterpreterResultMessage message = (InterpreterResultMessage) resource.get();
   TableData tableData = new InterpreterResultTableData(message);

   return new TableDataRelation(sqlContext, tableData);

 }
}

...

4.3. ResourceRegistry Class

...

For interpreters which use SQL

provide an interpreter option: create TableData whenever executing a paragraph
or provide new interpreter magic for it: %spark.sql_share, %jdbc.mysql_share, …
or automatically put all table results into the resource pool if they are not heavy (e.g keeping query only, or just reference for RDD)
If interpreter supports runtime interpreter, we can use this syntax: %jdbc(share=true) to specify whether share the table result or not

For interpreters which use programming language (e.g python)

provide API like z.put()

Code Block

language	scala
themelinenumbers	Eclipsetrue

// infer instance type and convert it to predefined the `TableData` subclass such as `SparkDataFrameTableData`
z.put (“myTable01”, myDataFrame01)

// or force user to put the `TableData` subclass
val myTableData01 = new SparkRDDTableData(myRdd01)
z.put(“myTable01”, myTableData01)

...

ZEPPELIN-TBD: Adding `pivot`, `filter` to TableData
ZEPPELIN-TBD: ResourceRegistry
ZEPPELIN-TBD: ZEPPELIN-TBD: Rest API for resource pool
ZEPPELIN-TBD: UI for `Table` page
ZEPPELIN_TBD: Apply `pivot`, `filter` methods for built-in visualisations
ZEPPELIN-TBD: SparkTableData, SparkSQLTableData, JDBCTableData, etc.
ZEPPELIN-2029: ACL for `ResourcePool`
ZEPPELIN-2022: Zeppelin resource pool as Spark DataSource

...

7. Potential Future Work

Watch / Unwatch: for automatic paragraph updating for Streaming Data Representation.
Ability to construct table result from the resource pool in language interpreters (e.g python)

Let’s assume that we can build a pandas dataframe using TableData

Code Block

language	py
linenumbers	true

# in python interpreter

t = z.get("tableResourceName") # will return object that has `hasNext` and `next`
p = new PandasTableData(t)

# use p.pandasInstance …

ZEPPELIN-1494: Bind JDBC result to a dataset on the Zeppelin context

Page tree

Versions Compared

Old Version 18

New Version 19

Key

4. Public Interfaces

4.1. Interfaces for TableData related classes

4.1.1. Additional methods for TableData

4.3. ResourceRegistry Class

7. Potential Future Work

Page tree

Page History

Versions Compared

Old Version 18

New Version 19

Key

4. Public Interfaces

4.1. Interfaces for TableData related classes

4.1.1. Additional methods for TableData

4.3. ResourceRegistry Class

7. Potential Future Work