Given that a we've heard now of a few university courses that are basing classes around Apache BigTop, here are some exersizes that you can do to get started with apache bigtop.


Exersize 1 : Deploy your own Big Data distribution from zero.

 

1) Download bigtop: git clone https://github.com/apache/bigtop

2) Go to the bigtop-deploy directory.

3) Install VirtualBox and Vagrant.

4) Change to the bigtop-deploy/vagrant-puppet-vm directory

5) Read the README.md, thouroughly.

6) Edit the vagrantconfig.yaml file, for example, to spin up a 2 node cluster.  Edit the components to include spark.

6b) Run "vagrant up" and wait for the process to finish.  Time how long it takes, and copy the end results down.

6c) Run "vagrant global-status" to list machines (i.e. bigtop-x-1) and "vagrant ssh bigtop-x-1" to ssh into the machine.

6d) Run "ls /", and then run "hadoop fs -ls /"  ... Any similarities, differences worth noting ?

 

Exersize 2 : Build, run and test the bigpetstore blueprint application.

7) Install Gradle (https://gradle.org/).

8) Use gradle to build the bigpetstore-mapreduce and bigpetstore-spark applications.  Try to run them in your VM above

9) Watch the original BigPetStore original video

10) Find the jar's built by (8).

11) Contrast the amount of code in the two applications.  What does the majority of the Hadoop code do? 

12) Notice that the Bigpetstore-spark application depends on an external data generator library, by reading its dependencies in build.gradle.

13) Run the bigpetstore-spark or bigpetstore-mapreduce application locally on your computer, and record the final results, and the generated data.

 

Exersize 3 : Build the BigTop RPM Packages and verify them

0) Look in the bigtop puppet code - find where the yum repository URL is referenced.

1) Pick a tool in the BigTop distribution (i.e. spark, pig, zookeeper, ...)

2) Install the bigtop dependencies, by using the bigtop toolchain (find it in the README).

3) Build the packages, what directory did they go to ? 

4) Read the vagrant instructions from Exersize 1, is it possible to deploy bigtop packages from your local RPM repository?

 

 

  • No labels