Helium - Brings Zeppelin to data analytics application platform

Motivation

Zeppelin is providing provides pluggable Interpreter architecture that which results in a wide varity variety of the supported backend system support.
Each interpreter abstracts underlying computing frameworks framework complexity (eg. SparkInterpreter abstracts Spark cluster) with their it's own interface (eg. SparkInterpreter provides scala/sql/python for the interface).

Also there 're has is a powerful feature called "Angular Display system" that enables user creates to create his own front-end interface that interacts with interpreter.
And Dependency loader enables loads there is a "dependency loader" that enables them to load libraries from remote repository.

Put this Putting it all gother, I one could imagine a full application platform, on top of Apache Zeppelin.
So i what I propose is a framework, code-named Helium that brings turns Zeppelin to into a data anlytics analytics application platform , by:

- Leveraging computing resources provided by Interpreters
- Generalizing dependency loader
- Providing SDK on top of Angular Display system
- With adding a package repository

What is Helium Application?

The idea is simple, instead of user write an code and display result on notebook, user runs packaged code and get result on the notebook.

Packaged code will able to access Zeppelin provided resources through Resource Pool as well as Display System to display any output.

Helium Application = View + Algoirthm Algorithm + Access to ResoruceResources

...

Anything you want to display inside of Zeppelin notebook.
Can be any standard html, css, javascript.
Your view and algorithm can interact.

How application displays output?

Each paragraph has output message, angular objects, dynamic forms. Single paragraph will have multiple applications and each of them has their own output message, angular objects just like an paragraph output

...

The code you want to run, which is any code that runs on JVM.

Resource

Provided by interpreter or provided by the other another Helium Application.

Every interpreter automatically provides result of last run.
Additionally they can provide their own resource (eg. SparkContext).
Also any user code , in Helium Application can provide any resource they want.

The resource can be any java object.
So it can be a data, it can be an abstraction of computing (eg. SparkContext), it can be anything.

...

How Helium Application runs

Application Applications are packaged into Jar Jars and published into maven repository.
Also it adds a spec file in package registry is required.

Then depends , depending on the Resource that Zeppelin the resource pool has, it Zeppelin automatically suggest possible Application that user can run.
When user selects an Application, that application is being downloaded and runs run on the interpreter process where resource exists.

SDK

User application Application extends org.apache.zeppelin.helium.Application class in SDK.

Image Added

SDK provides development mode, so you can actually run application inside of Zeppelin without full deployment.
Development mode application automatically refreshes In development mode an application automatically re-reads it's view as it's html/css/javascript resources changes, without the restart.

Here's short video how SDK works

Widget Connector

width	1000
url	http://youtube.com/watch?v=Ya_UQMRnl8U
height	700

Package Repository and spec file

Helium Application is packaged into Jar there for the standard Jar file, therefor it can be distributed by maven repository.
Package Repository is actually collectino collection of spec file. Each spec file provides information of:

- Name of Application
- Artifact name in maven repository
- Resources this application requires

The package repsoitory repository is going to to be maintained as separate gitrepo with it's own homepage. (like spark-packages.org for spark package), so any user can add their spec file applications there, without PMC review, wich scales well.
There will be a bot that automatically merge pull request of specfile merges pull requests w/ a specfiles into the master branch of the repo.

I propose the repository
https://github.com/zeppelin-project/helium-packages

Implementation.

There're proof of concept implementation.
https://github.com/Leemoonsoo/incubator-zeppelin/tree/helium

Actual implementation is in progress.

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	ZEPPELIN-533

Application examples

I have created some example applications based on PoC implementation.

...

SparkMon - appliction that access spark
https://github.com/Leemoonsoo/zeppelin-sparkmon

Video

Here's video of three example applications

Widget Connector

width	700
url	https://www.youtube.com/watch?v=8Wdc70e6QVI

...


height	500

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Helium - Brings Zeppelin to data analytics application platform

Motivation

What is Helium Application?

How application displays output?

Resource

How Helium Application runs

SDK

Package Repository and spec file

Implementation.

Application examples

Video

Page tree

Page History

Versions Compared

Old Version 1

New Version Current

Key

Helium - Brings Zeppelin to data analytics application platform

Motivation

What is Helium Application?

How application displays output?

Resource

How Helium Application runs

SDK

Package Repository and spec file

Implementation.

Application examples

Video