Page History

Title/Summary: Develop a 'NoSQL' Datastore component for Apache Cassandra, CouchDB, Hadoop/Hbase

Student: Eranda Sooriyabandara

Student e-mail: 070468d AT gmail DOT com

Student Major: Computer Science

Student Degree: Undergraduate

Student Graduation: October 2011

Organization: Apache Software Foundation

Assigned Mentor: Jean-Sebastien Delfino

Abstract:

Apache Tuscany provides a comprehensive infrastructure to simplify the task of developing and managing Service Oriented Architecture (SOA) solutions based on Service Component Components Architecture (SCA) standard. SCA abstracts business functions as components and motivate the business people/solution providers to use them as building blocks to create a business solution without knowing much about the underlying infrastructure.

'NoSQL' (Not Only SQL) databases are modern concept of databases which differ from classic relational database management systems in many ways like, ; they may not require fixed table schemas, avoid join operations and scale horizontally. Also in these databases they do not use Structured Query Language (SQL) to manipulate the database instead use an API. We can list down Apache Cassandra, CouchDB, Hadoop/Hbase and AppEngine Datastore as some of 'NoSQL' databases.

In this project my ultimate goal is to create a SCA portable data store component/s datastore components over number of 'NoSQL' databases like Apache Cassandra, CouchDB, Hadoop/Hbase and AppEngine Datastore databases using java. The main idea of creating this component these components is to hide the database APIs of each 'NoSQL' database and create a REST data store datastore interface which can be used by different people without worriying worrying about the underneith underneath database.

Implementation Plan:

In the implementation of SCA datastore component components need to consider about the following attributes,

Service
Reference
Property
Intent Policies
Implementation

So my task in this project to identify and have a clear idea of those attributes and implement implements them as a SCA SCA components. There are two components per each database. First one is REST datastore interface component and the other one is the wrapped database component.

Service: Since this component's major functionality

Major functionality of REST datastore interface component is to gives give 'NoSQL' database access , the to the user without worrying the underline database. The 'service' should describe the of this component describes a generic service interface to store and manipulate the data of all the 'NoSQL' databasedatabases. Before implementing the interface we need clarify the REST datastore interface services which we use in all the datastore components. This needs to be done carefully since some concepts are specialized to its database. For example, SuperColumnFamily in Apache Cassandra. Reference:
In the preference we need to create an interface which describes the dependancies. Currently I have an idea to put all the dependancies inside the datastore component. When creating composite component the co-ordinater component may use the reference interface which reference to different datastore component.
Property:
This define the configuration parameters of the components that can be used to describe the behavior of the datastore component. For example concurrency control in the datastore component. This parameters can be set in a configuration file which is a xml or a text file. This configuration file may change for different 'NoSQL' datastores. Need deep analysis of each DBMSs to find the configuration parameters.
Intent Policies:
Implementation policies:
This will be a transaction based implementation and need to have a log of each transaction. The logging function may included in the DBMS itself but here we need a seperate log to see whether each and every transaction which invoke the service interface endup as a successful transactions.
Interaction policies:
The datastore component need to have a user level authentication system to ensure the confidentiality of the data.
Implementation:
The components will be implemented using java. The logical task of this component is to do a mapping between the REST interface(service) and the database interface.
All the implementation I mentioned above based on my knowledge and need to discuss further to clear out the conflicts in the component.
In the implementation of SCA datastore component need to consider about the following attributes,

Service
Reference
Property
Intent Policies
Implementation

So my task in this project to identify and have a clear idea of those attributes and implement them as a SCA component.

Service:

Since this component's major functionality is to gives 'NoSQL' database access, the 'service' should describe the The “service” of the wrapper components describes a database specific service interface to store and manipulate the data of the related 'NoSQL' database.Before implementing the interface we need clarify the REST datastore interface services which we use in all the datastore components. This needs to be done carefully since some concepts are specialized to its database. For example SuperColumnFamily in Apache Cassandradatabases. These are varying with their APIs.

Reference:

In the preference we need to create an interface which describes the dependancies. Currently I have an idea to put all the dependancies inside the datastore component. When creating composite component the co-ordinater component may use the reference interface which reference to different datastore componentdependencies. The preference of REST datastore interface component will be directed to the wrapped database components service interface. Wrapped database components do not have references.

Property:

This define defines the configuration parameters of the components that can be used to describe the behavior behaviour of the datastore componentcomponents. For example concurrency control controls in the datastore componentcomponents. This These parameters can be set in a configuration file which is a an xml or a text file. This configuration file may change for different 'NoSQL' datastores. Need deep analysis of each DBMSs DBMS to find the configuration parameters.

...

This will be a transaction based implementation and need to have a log of each transaction. The logging function may included in the DBMS itself but here we need a seperate separate log to see whether each and every transaction which invoke the service interface endup end up as a successful transactions.

Interaction policies:

The datastore component need to have a user level authentication system to ensure the confidentiality of the data.

Implementation:

The components will be implemented using java. The The logical task of this ,

REST datastore interface component is to mediate the transaction to the wrapped database component and get back the results to the user
Wrapped database component is wrapping the ‘NoSQL’ database as a SCA component

Here is a sample for how components work together

Image Added do a mapping between the REST interface(service) and the database interface.

All the implementation I mentioned above based on my knowledge and need the ideas of Jean-Sebastian came up with. Need to discuss further to clear out the conflicts in the component.

Deliverables:

NoSQL Datastore components for Apache Cassandra, CouchDB and Hadoop/Hbase databases and a composit Datastore component.
A documentation and a tutorial for the new components.

The REST interface component.
Components which Wrapped NoSQL Datastore components for Apache Cassandra, CouchDB and Hadoop/Hbase databases and a composit Datastore component.
Functionality testing framework.
A documentation Documentation and a tutorial for the new components.

Time-line:

April 25 - May 23

Continue studying on
- How Tuscany works
- How to create a SCA components by reading and implementing sample SCA components.
Discuss the problems, ideas and the conflicts with the mentor and other Tuscany community members.Understand the APIs of the NoSQL DBMSs
Define a sample scenario for the implementation over the various databases
Use that sample scenario to identify the APIs of the databases.
Put database independent parts of the scenario in Tuscany and mock up the database access (identify the different commands).
Contact the Apache Cassandra, CouchDB and Hadoop/Hbase communities if there is a problem of understanding.

...

Decide the API for access and manipulate data in the NoSQL datastore componentcomponents.
Starting implementation of the Datastore components datastore components
- Stage 2: Implementing the REST interface component (Abstract model).
- Stage 1: Implementing component for Apache Cassandra and modify the REST interface component to support Apache Cassandra.
  - Do functional tests for the component
- Stage 2 : Implementing component for CouchDB
- Do functional tests for the component
  - using the REST interface component.

July 11

Mid-term evaluation of the project.

July 12 - August 15

Continue implementation of the datastore components
- Stage 3:
Component Hadoop/Hbase
- Implementing component for CouchDB and modify the REST interface component to support CouchDB.
  - Do functional tests for the component using the REST interface component
Create a SCA Composite out of all the components
- - .
- Stage 4: Implementing component Hadoop/Hbase and modify the REST interface component to support Hadoop/Hbase.
  - Do functional tests for the component using the REST interface component.

Write a documentation and a tutorial for the new components using a well known use-case scenario.

...

Community Interactions:

Working with a an Open source model of communication I like to interact the community via,

JIRA issue tracking system
Apache Tuscany mailing-list
irc channel (#tuscany)
private chats on gtalk or Skype

Using this these mediums I like to do my project fully open to the community and take the precious ideas of each and every community member.

...

I am Eranda Sooriyabandara a final year student of Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. As I am very much interested in databses I have experienced in working with databases like Apache Derby, MySQL, PostgreSQL, Oracle and Apache Cassandra as a 'NoSQL'. Also I have knowledge on Service Oriented Architecure and related topics like web services, SOAP since I had 6 month internship in a SOA middleware company.

The reason I invove involve in this project is because this is a great chance to learn about 'NoSQL' databases like Apache Cassandra, CouchDB, Hadoop/Hbase and AppEngine datastore and I can experience the Service Component Components Architecture, which is bit new technology to me but I like to learn the further while doing my contribution to Apache Tuscany. Also working with a an experienced community is a big opportunity to me to learn new technologies from a the best.

Child pages

Versions Compared

Old Version 11

New Version Current

Key

Abstract:

Implementation Plan:

Deliverables:

Time-line:

Community Interactions: