Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

My main objective in the project would be to take up as many as possible of the above mentioned data sets and create notebooks for each of them using the existing support for the various interpreters. This would involve examining the datasets first to decide which of the interpreters would work best for which dataset and then to write out the notebook. The main interpreters that I propose to use in the project are Spark and Flink. Spark has a variety of powerful features that make it suitable for the analysis of datasets. Spark's MLLib Machine Learning libraries may be used to build regression models of the datasets and predict the values of the test data based on the training data set. Regression analysis may be achieved through inbuilt classes such as 'LogisticRegressionWithSGD' available in MLLib. Also Helium functionality may be added to enhance the notebooks.

...

> Alex: can you please have 2 sections here for mid-term and final deliverables. I.e "min number of N notebooks before mid-term (at least K using Helium) and N after (at least L packaged though Helium)"

Deliverables before the mid-term would include : 

  • A set of at least 2 notebooks for the above proposed datasets with at least 1 using Helium functionality.

At the end of the project, the deliverables would include :

  • A set of at least 4 notebooks for the above proposed data sets with at least 2 using Helium functionality.

  • Documentation for the notebooks.

  • Results of any tests and bug fixes that were encountered during the development phase.

...

I would be having my end semester exams from April 25 – May 9 and so I would only be able to commence with the community engagement period from May 10 – onwards. For the rest of the summer, I would be completely free with no commitments other than this project. I would easily be able to give 14 – 15 8 hours each day to the project to ensure its completion at all costs.

...