Apache Airavata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Project Title

Small Molecule Ionic Lattices (SMILES) Data Models.

Abstract

The goal of this project is to design and implement the solution for Airavata Data Catalog and the Data Parsers to analyze the metadata extracted from the Literature, Experimental and Computational records in support of Small Molecule Iconic Isolation Lattices (SMILES) Data. Particularly, this includes the data synchronization with the SEAGrid Data Analysis Portal and the Gateway users.

Proposal Content

Problem Definition

Airavata is used by science gateways as a platform to create, submit, execute and monitor different types of scientific jobs and workflows in scientific grids. Airavata is using the three types of individual databases to store the metadata of a particular chemical compound. In the current architecture, there are a few drawbacks in representing the data over the SEAGrid Data Analysis Portal.

The chemical compounds are represented with the missing fields and unstructured keys. Therefore, the scientific representation of the compounds with the Scientific Data Model (SDM) and related Ontology (SDMO) is much better to analyze the behavior of a compound. The use of Google Protobuffer is a good choice to structure the schema of a compound.

Solution Overview

Data Modeling

Implementing a functional database

After successful modeling of the data, the data will be passed to the functional database where the fixed and finest parameters of a particular chemical compound are rendered with the dashboard implemented. This functional database is implemented in MongoDB. Creating scientific strings such as SMILES and InChi as a primary key and accessing the data using these strings.

Main Components of the Solution

The main components of the solution are identified as:

  1. Airavata Portal
    • Custom Django UI 
    • Apache Airavata Data Lake
  2. Data Modeling
    • Protobuf files

Deliverables

  • Redesigning the Data Models.
  • Creating a robust database to reduce the latency.
  • Synchronizing the data with the Dashboard.

Timeline

Divided into two-week sprints

Sprint timelineSprint plan
13th June












  • No labels