You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Helium - Brings Zeppelin to data analytics application platform

 

Motivation


Zeppelin is providing pluggable Interpreter architecture that results wide varity of backend system support.
Each interpreter abstracts underlying computing frameworks (eg. SparkInterpreter abstracts Spark cluster) with their own interface (eg. SparkInterpreter provides scala/sql/python for the interface).

Also there're has powerful feature called Angular Display system that enables user creates own front-end interface that interacts with interpreter.
And Dependency loader enables loads libraries from remote repository.


Put this all gother, I could imagine application platform on top of Apache Zeppelin.
So i propose framework Helium that brings Zeppelin to data anlytics application platform, by

- Leveraging computing resources provided by Interpreters
- Generalizing dependency loader
- Providing SDK on top of Angular Display system
- With package repository

 

What is Helium Application?


Helium Application = View + Algoirthm + Access to Resoruce

 


View

Anything you want to display inside of Zeppelin notebook.
Can be any standard html, css, javascript.
Your view and algorithm can interact.


Algorithm

The code you want to run, which is any code that runs on JVM.


Resource

Provided by interpreter or provided by the other Helium Application.

Every interpreter automatically provides result of last run.
Additionally they can provide their own resource (eg. SparkContext).
Also any user code, in Helium Application can provide any resource they want.


The resource can be any java object.
So it can be data, it can be abstraction of computing (eg. SparkContext), it can be anything.

 


How Helium Application runs


Application packaged into Jar and published into maven repository.
Also it adds spec file in package registry.

Then depends on Resource that resource pool has, Zeppelin automatically suggest possible Application user can run.
When user selects Application, application is being downloaded and runs on the interpreter process where resource exists.

 

 


SDK

User application extends org.apache.zeppelin.helium.Application class in SDK.
SDK provides development mode, so you can actually run application inside of Zeppelin without deployment.
Development mode application automatically refreshes it's view as it's html/css/javascript resources changes without restart.

 

Package Repository and spec file

Helium Application packaged into Jar there for it can be distributed by maven repository.
Package Repository is actually collectino of spec file. Each spec file provides information of

- Name of Application
- Artifact name in maven repository
- Resources this application requires


The package repsoitory is going to to be maintained as separate gitrepo with it's own homepage. (like spark-packages.org for spark package), so any user can add their spec file without PMC review.
There will be a bot that automatically merge pull request of specfile into the master branch.

I propose the repository
https://github.com/zeppelin-project/helium-packages

 

Implementation.

There're proof of concept implementation.
https://github.com/Leemoonsoo/incubator-zeppelin/tree/helium

 

Application examples

I have created some example applications based on PoC implementation.

Git commit data - datasource
https://github.com/Leemoonsoo/zeppelin-gitcommitdata

Wordcloud - visualize the paragraph's table result
https://github.com/Leemoonsoo/zeppelin-wordcloud

SparkMon - appliction that access spark
https://github.com/Leemoonsoo/zeppelin-sparkmon

Video
https://www.youtube.com/watch?v=8Wdc70e6QVI&feature=youtu.be

  • No labels