Apache Airavata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Current »


Project Title

Small Molecule Ionic Lattices (SMILES) Data Models.

Abstract

The goal of this project is to design and implement the solution for Airavata Data Catalog and the Data Parsers to analyze the metadata extracted from the Literature, Experimental and Computational records in support of Small Molecule Iconic Isolation Lattices (SMILES) Data. Particularly, this includes the data synchronization with the SEAGrid Data Analysis Portal and the Gateway users.

Proposal Content

Problem Definition

Airavata is used by science gateways as a platform to create, submit, execute and monitor different types of scientific jobs and workflows in scientific grids. Airavata is using the three types of individual databases to store the metadata of a particular chemical compound. In the current architecture, there are a few drawbacks in representing the data over the SEAGrid Data Analysis Portal.

The chemical compounds are represented with the missing fields and unstructured keys. Therefore, the scientific representation of the compounds with the Scientific Data Model (SDM) and related Ontology (SDMO) is much better to analyze the behavior of a compound. The use of Google Protobuffer is a good choice to structure the schema of a compound.

Solution Overview

Data Modeling


Implementing a functional database

After successful modeling of the data, the data will be passed to the functional database where the fixed and finest parameters of a particular chemical compound are rendered with the dashboard implemented. This functional database is implemented in MongoDB. Creating scientific strings such as SMILES and InChi as a primary key and accessing the data using these strings.

Main Components of the Solution

The main components of the solution are identified as:

  1. Airavata Portal
    • Custom Django UI 
    • Apache Airavata Data Lake
  2. Data Modeling
    • Protobuf files

Deliverables

  • Redesigning the Data Models.
  • Creating a robust database to reduce the latency.
  • Synchronizing the data with the Dashboard.

Timeline


TaskTimelineDeliverables
Study Airavata Django Portal Framework (ADPF)May 27, 2022
  • Setup the Django portal locally.
  • Study the procedure of the computational experiment
  • Understand the input file format to function the experiment.
  • Understand the Gaussian log.
  • Create a customized Django application.
Initializing the databasesJune 1, 2022
  • Setup the databases locally.
  • Understand the schema.
Data ModelingJune 15, 2022
  • Configure the data model.
  • Design the logical schema in protobuf format.
  • Trigger the empty values and define a specified error flag.
Functional DatabaseJuly 1, 2022
  • Create a functional database
  • Define a schema for the functional database.
  • Ingesting data from the primary databases.
Testing and ValidationsJuly 12, 2022
  • Build a test plan for the data affirmation
  • Execute tests and validate the data flow process.
Project report and Final DocumentationJuly 29, 2022
  • Describe the results, observations, and insights in the final documentation.

References

  1. Jira Issue: Unable to render Jira issues macro, execution error.
  • No labels