Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Emerging data sets made available by the European Union
    The European Union provides datasets on a number of subjects such as transportation, education, communication, population, economy and health.

  2. Health-related data sets to track the health of people, to do genomic sequencing. MIRAGE( minimum information required for a glycomics(molecule) experiment) is another field for which public data is available.

  3. World financial data such as the Balance of Payments, economic data made available by the International Monetary Fund.

  4. Data sets made available by the Stanford University on topics such as on-line community interaction.

  5. Data sets made available by the World Bank on various subjects such as poverty, income, population , growth(in GDP), environment(CO2 emissions), disease patterns across the world.

  6. Amazon Web Services public datasets provides a huge resource of datasets such as the Common Crawl dataset which can be analyzed for almost any information on the web using tools such as the Warcbase project.

...

At the end of the project, the deliverables would include :

  • A set of at least 4 6 notebooks for the above proposed data sets with at least 2 using Helium functionality.

  • Documentation for the notebooks.

  • Results of any tests and bug fixes that were encountered during the development phase.

...

The time line breakdown is the same for the second half of the project as the first half, although I expect that once things are learned and grasped, work in the second half would be quicker than the first half. The creation of four six notebooks is not an upper bound, as mentioned above, if time permits I would not hesitate to create more notebooks(more than six) on other data sets.

July 30 – Pencils Down : This period would be used to improve existing documentation, testing, bug fixing, and other enhancements on the notebooks created.

...