Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

What is Bigtop Sandbox?

A handy tool to build and run big data pseudo clusters atop on Docker.

How to run

Make sure you have Docker installed. We've tested this using Docker for Mac

Currently supported OS list:

  • debian-8
  • ubuntu-16.04

Run Hadoop HDFS

docker run -d -p 50070:50070 bigtop/sandbox:1.2.1-ubuntu-16.04-hdfs
For HDFS, it takes around 30 secs. You can use docker logs to see whether it has been provisioned:
BIGTOP=$(docker run -d -p 50070:50070 bigtop/sandbox:1.2.1-ubuntu-16.04-hdfs)

...

Code Block
languagebash
Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Hash. There is further documentation for validate_legacy function in the README.
   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')
Warning: This method is deprecated, please use match expressions with Stdlib::Compat::Bool instead. They are described at https://docs.puppet.com/puppet/latest/reference/lang_data_type.html#match-expressions.
   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')
Warning: This method is deprecated, please use match expressions with Stdlib::Compat::Array instead. They are described at https://docs.puppet.com/puppet/latest/reference/lang_data_type.html#match-expressions.
   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')
Notice: Scope(Class[Node_with_components]): Roles to deploy: [namenode, datanode]
Warning: This method is deprecated, please use the stdlib validate_legacy function, with Pattern[]. There is further documentation for validate_legacy function in the README.
   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')
Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Bool. There is further documentation for validate_legacy function in the README.
   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')
Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::String. There is further documentation for validate_legacy function in the README.
   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')
Warning: This method is deprecated, please use match expressions with Stdlib::Compat::Numeric instead. They are described at https://docs.puppet.com/puppet/latest/reference/lang_data_type.html#match-expressions.
   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')
Notice: Compiled catalog for 9c26fcceafad.local in environment production in 1.45 seconds
Notice: Baseurl: http://repos.bigtop.apache.org/releases/1.2.1/ubuntu/16.04/x86_64
Notice: /Stage[main]/Bigtop_repo/Notify[Baseurl: http://repos.bigtop.apache.org/releases/1.2.1/ubuntu/16.04/x86_64]/message: defined 'message' as 'Baseurl: http://repos.bigtop.apache.org/releases/1.2.1/ubuntu/16.04/x86_64'
Notice: /Stage[main]/Bigtop_repo/Exec[bigtop-apt-update]/returns: executed successfully
Notice: /Stage[main]/Hadoop::Common_hdfs/File[/etc/hadoop/conf/core-site.xml]/content: content changed '{md5}71506958747641d1a5def83b021e7f75' to '{md5}ce32af59eb015a3bb3774d375be10f11'
Notice: /Stage[main]/Hadoop::Common_hdfs/File[/etc/hadoop/conf/hdfs-site.xml]/content: content changed '{md5}784883dd654527ae577de19ecdec0992' to '{md5}ddc0a621878650832f30eb9690aa7565'
Notice: /Stage[main]/Hadoop::Namenode/Service[hadoop-hdfs-namenode]/ensure: ensure changed 'stopped' to 'running'
Notice: /Stage[main]/Hadoop::Datanode/File[/data/1/hdfs]/mode: mode changed '0700' to '0755'
Notice: /Stage[main]/Hadoop::Datanode/File[/data/2/hdfs]/mode: mode changed '0700' to '0755'
Notice: /Stage[main]/Hadoop::Datanode/Service[hadoop-hdfs-datanode]/ensure: ensure changed 'stopped' to 'running'
Notice: /Stage[main]/Hadoop::Init_hdfs/Exec[init hdfs]/returns: executed successfully
Notice: Finished catalog run in 29.46 seconds
After provisioned, goto http://localhost:50070, you'll see the web UI is ready there.
To destroy the container:
docker stop $BIGTOP
docker rm $BIGTOP

Run Hadoop HDFS + HBase

BIGTOP=$(docker run -d -p 50070:50070 -p 6001016010:6001016010 bigtop/sandbox:1.2.1-ubuntu-16.04_-hdfs_hbase)
docker exec -ti $BIGTOP hbase shell

Run Hadoop HDFS + Spark Standalone

BIGTOP=$(docker run -d -p 50070:50070 -p 80888080:80888080 bigtop/sandbox:1.2.1-ubuntu-16.04_-hdfs_spark-standalone)
docker exec -ti $BIGTOP spark-shell

 

Run Hadoop HDFS + YARN + Hive + Pig

BIGTOP=$(docker run -d -p 50070:50070 -p 80808088:80808088 bigtop/sandbox:1.2.1-ubuntu-16.04_-hdfs_yarn_hive_pig)
docker exec -ti $BIGTOP hive
docker exec -ti $BIGTOP pig

How to build

Download Bigtop

Go to http://bigtop.apache.org/download.html#releases and download the latest bigtop release. After downloaded:

Code Block
languagebash
tar zxvf bigtop-1.2.1-project.tar.gz
cd bigtop-1.2.1/docker/sandbox

Build a Hadoop HDFS sandbox image

./build.sh -a bigtop -o centosubuntu-616.04 -c hdfs

Build a Hadoop HDFS, Hadoop YARN, and Spark on YARN sandbox image

./build.sh -a bigtop -o debianubuntu-816.04 -c "hdfs, yarn, spark"

Build a Hadoop HDFS and HBase sandbox image

./build.sh -a bigtop -o ubuntu-16.04 -c "hdfs, hbase"

Use --dryrun to skip the build and get Dockerfile and configuration

./build.sh -a bigtop -o ubuntu-16.04 -c "hdfs, hbase" --dryrun

Change the repository of packages

  • Change the repository to Bigtop's nightly centos-6 repo
export REPO=http://cirepos.bigtop.apache.org:8080/job/Bigtop-trunk-repos/BUILD_ENVIRONMENTS=centos-6%2Clabel=docker-slave-06//ws/outputreleases/1.2.1/debian/8/x86_64
./build.sh -a bigtop -o centosubuntu-616.04 -c "hdfs, yarn, spark, ignite"

Customize your Big Data Stack

  • Edit site.yaml.template.centos-6_hadoop to create your own prefered stack
cpvim site.yaml.template.centosdebian-68_hadoop site.yaml.template.centos-6_hadoop_ignite
vim site.yaml.template.centos-6_hadoop_ignite
  • Add ignite in hadoop_cluster_node::cluster_components array and leave the other unchanged
...
hadoop_cluster_node::cluster_components: [hdfs, yarn, ignite]
...

...

# Configure your own stack
./build.sh -a bigtop -o centosdebian-68 -f site.yaml.template.centosdebian-68_hadoop_ignite -t my_ignitehadoop_stack

Known issues

Fail to start daemons using systemd

Since systemd requires CAP_SYS_ADMIN, currently any OS using systemd can not successfully started up daemons during image build time.

Daemons can be brought up only if --privileged specified using docker run command.Please read the doc here.

Reference

Available Sandboxes: https://hub.docker.com/r/bigtop/sandbox/tags/

Build status: https://ci.bigtop.apache.org/view/Docker/job/Docker-Sandbox/

DataWorks Summit 2017 slide: https://www.slideshare.net/saintya/leveraging-docker-for-hadoop-build-automation-and-big-data-stack-provisioning