Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

Title/Summary: Develop a 'NoSQL' Datastore component for Apache Cassandra, CouchDB, Hadoop/Hbase

Student: Eranda Sooriyabandara

Student e-mail: 070468d AT gmail DOT com

Student Major: Computer Science

Student Degree: Undergraduate

Student Graduation: October 2011

Organization: Apache Software Foundation

Assigned Mentor: Jean-Sebastien Delfino

Abstract: 

Apache Tuscany provides a comprehensive infrastructure to simplify the task of developing and managing Service Oriented Architecture (SOA) solutions based on Service Component Components Architecture (SCA) standard. SCA abstracts business functions as components and motivate the business people/solution providers to use them as building blocks to create a business solution without knowing much about the underlying infrastructure.

'NoSQL' (Not Only SQL) databases are modern concept of databases which differ from classic relational database management systems in many ways like, ; they may not require fixed table schemas, avoid join operations and scale horizontally. Also in these databases they do not use Structured Query Language (SQL) to manipulate the database instead use an API. We can list down Apache Cassandra, CouchDB, Hadoop/Hbase and AppEngine Datastore as some of 'NoSQL' databases.

In this project my ultimate goal is to create a SCA portable data store component/s datastore components over number of 'NoSQL' databases like Apache Cassandra, CouchDB, Hadoop/Hbase and AppEngine Datastore databases using java. The main idea of creating this component these components is to hide the database APIs of each 'NoSQL' database and create a REST data store datastore interface which can be used by different people without worriying worrying about the underneith underneath database.

...

Implementation Plan:

NoSQL Datastore components for Apache Cassandra, CouchDB and Hadoop/Hbase databases and a composit Datastore component.
A documentation and a tutorial for the new components.

  • NoSQL Datastore components for Apache Cassandra, CouchDB and Hadoop/Hbase databases and a composit Datastore component.
  • A documentation and a tutorial for the new components.
Time-line:

...

In the implementation of SCA datastore components need to consider about the following attributes,

  • Service
  • Reference
  • Property
  • Intent Policies
  • Implementation

So my task in this project to identify and have a clear idea of those attributes and implements them as SCA components. There are two components per each database. First one is REST datastore interface component and the other one is the wrapped database component.

Service: 

Major functionality of REST datastore interface component is to give 'NoSQL' database access to the user without worrying the underline database. The 'service' of this component describes a generic service interface to store and manipulate the data of all the 'NoSQL' databases. Before implementing the interface we need clarify the REST datastore interface services which we use in all the datastore components. This needs to be done carefully since some concepts are specialized to its database. For example, SuperColumnFamily in Apache Cassandra.

The “service” of the wrapper components describes a database specific service interface to store and manipulate the data of related 'NoSQL' databases. These are varying with their APIs.

Reference:

In the preference we need to create an interface which describes the dependencies. The preference of REST datastore interface component will be directed to the wrapped database components service interface. Wrapped database components do not have references.

Property:

This defines the configuration parameters of the components that can be used to describe the behaviour of the datastore components. For example concurrency controls in the datastore components. These parameters can be set in a configuration file which is an xml or a text file. This configuration file may change for different 'NoSQL' datastores. Need deep analysis of each DBMS to find the configuration parameters.

Intent Policies:

  • Implementation policies:

This will be a transaction based implementation and need to have a log of each transaction. The logging function may included in the DBMS itself but here we need a separate log to see whether each and every transaction which invoke the service interface end up as a successful transactions.

Implementation:

The components will be implemented using java. The logical task of,

  • REST datastore interface component is to mediate the transaction to the wrapped database component and get back the results to the user
  • Wrapped database component is wrapping the ‘NoSQL’ database as a SCA component

Here is a sample for how components work together

Image Added

All the implementation I mentioned above based on my knowledge and the ideas of Jean-Sebastian came up with. Need to discuss further to clear out the conflicts in the component.

Deliverables:
  1. The REST interface component.
  2. Components which Wrapped Apache Cassandra, CouchDB and Hadoop/Hbase databases.
  3. Functionality testing framework.
  4. Documentation and a tutorial for the new components.
Time-line:

April 25 - May 23

  • Continue studying on 
    • How Tuscany works 
    • How to create a SCA components by reading and implementing sample SCA components.
  • Discuss the problems, ideas and the conflicts with the mentor and other Tuscany community members.Understand the APIs of the NoSQL DBMSs
  • Define a  sample scenario for the implementation over the various databases
  • Use that sample scenario to identify the APIs of the databases.
  • Put database independent parts of the scenario in Tuscany and mock up the database access (identify the different commands).
  • Contact the Apache Cassandra, CouchDB and Hadoop/Hbase communities if there is a problem of understanding.

...

  • Decide the API for access and manipulate data in the NoSQL datastore componentcomponents.
  • Starting implementation of the Datastore datastore components
    • Stage 2: Implementing the REST interface component (Abstract model).
    • Stage 1: Implementing component for Apache Cassandra and modify the REST interface component to support Apache Cassandra.
      • Do functional tests for the component
    • Stage 2 : Implementing component for CouchDB
    • Do functional tests for the component
      • using the REST interface component.

July 11

  • Mid-term evaluation of the project.

July 12 - August 15

  • Continue implementation of the datastore components
    • Stage 3:
    Component Hadoop/Hbase
    •  Implementing component for CouchDB and modify the REST interface component to support CouchDB.
      • Do functional tests for the component using the REST interface component
    Create a SCA Composite out of all the components
      • .
    • Stage 4: Implementing component Hadoop/Hbase and modify the REST interface component to support Hadoop/Hbase.
      • Do functional tests for the component using the REST interface component.
  • Write a documentation and a tutorial for the new components using a well known use-case scenario.

August 16 - August 22

  • Make the final adjustments to all the deliverables for the submission.

August 26

  • Final evaluation deadline.
Community Interactions:

Working with an Open source model of communication I like to interact the community via,

  • JIRA issue tracking system
  • Apache Tuscany mailing-list
  • irc channel (#tuscany)
  • private chats on gtalk or Skype

Using these mediums I like to do my project fully open to the community and take the precious ideas of each and every community member.

Biography:

I am Eranda Sooriyabandara a final year student of Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. As I am very much interested in databses I have experienced in working with databases like Apache Derby, MySQL, PostgreSQL, Oracle and Apache Cassandra as a 'NoSQL'. Also I have knowledge on Service Oriented Architecure and related topics like web services, SOAP since I had 6 month internship in a SOA middleware company.

The reason I invove involve in this project is because this is a great chance to learn about 'NoSQL' databases like Apache Cassandra, CouchDB, Hadoop/Hbase and AppEngine datastore and I can experience the Service Component Components Architecture, which is bit new technology to me but I like to learn the further while doing my contribution to Apache Tuscany. Also working with an experienced community is a big opportunity to me to learn new technologies from the best.