Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Helium - Brings Zeppelin to data analytics application platform

 

Motivation


Zeppelin is providing provides pluggable Interpreter architecture that which results in a wide varity variety of the supported backend system support.
Each interpreter abstracts underlying computing frameworks framework complexity (eg. SparkInterpreter abstracts Spark cluster) with their it's own interface (eg. SparkInterpreter provides scala/sql/python for the interface).

Also there 're has is a powerful feature called "Angular Display system" that enables user creates to create his own front-end interface that interacts with interpreter.
And Dependency loader enables loads there is a "dependency loader" that enables them to load libraries from remote repository.


Put this Putting it all gother, I one could imagine a full application platform, on top of Apache Zeppelin.
So i what I propose is a framework, code-named Helium that brings turns Zeppelin to into a data anlytics analytics application platform , by:

- Leveraging computing resources provided by Interpreters
- Generalizing dependency loader
- Providing SDK on top of Angular Display system
- With adding a package repository

 

What is Helium Application?

The idea is simple, instead of user write an code and display result on notebook, user runs packaged code and get result on the notebook.

Packaged code will able to access Zeppelin provided resources through Resource Pool as well as Display System to display any output.

Helium Application = View + Algoirthm Algorithm + Access to ResoruceResources

 

...

Anything you want to display inside of Zeppelin notebook.
Can be any standard html, css, javascript.
Your view and algorithm can interact.

How application displays output?

Each paragraph has output message, angular objects, dynamic forms. Single paragraph will have multiple applications and each of them has their own output message, angular objects just like an paragraph output

...

The code you want to run, which is any code that runs on JVM.

Resource

Provided by interpreter or provided by the other another Helium Application.

Every interpreter automatically provides result of last run.
Additionally they can provide their own resource (eg. SparkContext).
Also any user code , in Helium Application can provide any resource they want.


The resource can be any java object.
So it can be a data, it can be an abstraction of computing (eg. SparkContext), it can be anything.

...

How Helium Application runs

Application Applications are packaged into Jar Jars and published into maven repository.
Also it adds a spec file in package registry is required.

Then depends , depending on the Resource that Zeppelin the resource pool has, it Zeppelin automatically suggest possible Application that user can run.
When user selects an Application, that application is being downloaded and runs run on the interpreter process where resource exists.

 

 


SDK

User application Application extends org.apache.zeppelin.helium.Application class in SDK.

Image Added

 


SDK provides development mode, so you can actually run application inside of Zeppelin without full deployment.
Development mode application automatically refreshes In development mode an application automatically re-reads it's view as it's html/css/javascript resources changes, without the restart.

Here's short video how SDK works

Widget Connector
width1000
urlhttp://youtube.com/watch?v=Ya_UQMRnl8U
height700

 

Package Repository and spec file

Helium Application is packaged into Jar there for the standard Jar file, therefor it can be distributed by maven repository.
Package Repository is actually collectino collection of spec file. Each spec file provides information of:

- Name of Application
- Artifact name in maven repository
- Resources this application requires


The package repsoitory repository is going to to be maintained as separate gitrepo with it's own homepage. (like spark-packages.org for spark package), so any user can add their spec file applications there, without PMC review, wich scales well.
There will be a bot that automatically merge pull request of specfile merges pull requests w/ a specfiles into the master branch of the repo.

I propose the repository
https://github.com/zeppelin-project/helium-packages 

Implementation.

There're proof of concept implementation.
https://github.com/Leemoonsoo/incubator-zeppelin/tree/helium

Actual implementation is in progress.

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyZEPPELIN-533
 

Application examples

I have created some example applications based on PoC implementation.

...

SparkMon - appliction that access spark
https://github.com/Leemoonsoo/zeppelin-sparkmon

Video

Here's video of three example applications

Widget Connector
width700
urlhttps://www.youtube.com/watch?v=8Wdc70e6QVI

...

height500