What is Bigtop Sandbox?
A handy tool to run big data pseudo clusters on Docker.
How to run
Make sure you have Docker installed. We've tested this using Docker for Mac
Currently supported OS list:
- debian-8
- ubuntu-16.04
Run Hadoop HDFS
docker run -d -p 50070:50070 bigtop/sandbox:1.2.1-ubuntu-16.04-hdfs
For HDFS, it takes around 30 secs. You can use docker logs to see whether it has been provisioned:
BIGTOP=$(docker run -d -p 50070:50070 bigtop/sandbox:1.2.1-ubuntu-16.04-hdfs)
docker logs -f $BIGTOP
Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Hash. There is further documentation for validate_legacy function in the README. (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation') Warning: This method is deprecated, please use match expressions with Stdlib::Compat::Bool instead. They are described at https://docs.puppet.com/puppet/latest/reference/lang_data_type.html#match-expressions. (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation') Warning: This method is deprecated, please use match expressions with Stdlib::Compat::Array instead. They are described at https://docs.puppet.com/puppet/latest/reference/lang_data_type.html#match-expressions. (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation') Notice: Scope(Class[Node_with_components]): Roles to deploy: [namenode, datanode] Warning: This method is deprecated, please use the stdlib validate_legacy function, with Pattern[]. There is further documentation for validate_legacy function in the README. (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation') Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Bool. There is further documentation for validate_legacy function in the README. (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation') Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::String. There is further documentation for validate_legacy function in the README. (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation') Warning: This method is deprecated, please use match expressions with Stdlib::Compat::Numeric instead. They are described at https://docs.puppet.com/puppet/latest/reference/lang_data_type.html#match-expressions. (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation') Notice: Compiled catalog for 9c26fcceafad.local in environment production in 1.45 seconds Notice: Baseurl: http://repos.bigtop.apache.org/releases/1.2.1/ubuntu/16.04/x86_64 Notice: /Stage[main]/Bigtop_repo/Notify[Baseurl: http://repos.bigtop.apache.org/releases/1.2.1/ubuntu/16.04/x86_64]/message: defined 'message' as 'Baseurl: http://repos.bigtop.apache.org/releases/1.2.1/ubuntu/16.04/x86_64' Notice: /Stage[main]/Bigtop_repo/Exec[bigtop-apt-update]/returns: executed successfully Notice: /Stage[main]/Hadoop::Common_hdfs/File[/etc/hadoop/conf/core-site.xml]/content: content changed '{md5}71506958747641d1a5def83b021e7f75' to '{md5}ce32af59eb015a3bb3774d375be10f11' Notice: /Stage[main]/Hadoop::Common_hdfs/File[/etc/hadoop/conf/hdfs-site.xml]/content: content changed '{md5}784883dd654527ae577de19ecdec0992' to '{md5}ddc0a621878650832f30eb9690aa7565' Notice: /Stage[main]/Hadoop::Namenode/Service[hadoop-hdfs-namenode]/ensure: ensure changed 'stopped' to 'running' Notice: /Stage[main]/Hadoop::Datanode/File[/data/1/hdfs]/mode: mode changed '0700' to '0755' Notice: /Stage[main]/Hadoop::Datanode/File[/data/2/hdfs]/mode: mode changed '0700' to '0755' Notice: /Stage[main]/Hadoop::Datanode/Service[hadoop-hdfs-datanode]/ensure: ensure changed 'stopped' to 'running' Notice: /Stage[main]/Hadoop::Init_hdfs/Exec[init hdfs]/returns: executed successfully Notice: Finished catalog run in 29.46 seconds
After provisioned, goto http://localhost:50070, you'll see the web UI is ready there.
To destroy the container:
docker stop $BIGTOP
docker rm $BIGTOP
Run Hadoop HDFS + HBase
BIGTOP=$(docker run -d -p 50070:50070 -p 16010:16010 bigtop/sandbox:1.2.1-ubuntu-16.04-hdfs_hbase)
docker exec -ti $BIGTOP hbase shell
Run Hadoop HDFS + Spark Standalone
BIGTOP=$(docker run -d -p 50070:50070 -p 8080:8080 bigtop/sandbox:1.2.1-ubuntu-16.04-hdfs_spark-standalone)
docker exec -ti $BIGTOP spark-shell
Run Hadoop HDFS + YARN + Hive + Pig
BIGTOP=$(docker run -d -p 50070:50070 -p 8088:8088 bigtop/sandbox:1.2.1-ubuntu-16.04-hdfs_yarn_hive_pig)
docker exec -ti $BIGTOP hive
docker exec -ti $BIGTOP pig
How to build
Download Bigtop
Go to http://bigtop.apache.org/download.html#releases and download the latest bigtop release. After downloaded:
tar zxvf bigtop-1.2.1-project.tar.gz cd bigtop-1.2.1/docker/sandbox
Build a Hadoop HDFS sandbox image
./build.sh -a bigtop -o ubuntu-16.04 -c hdfs
Build a Hadoop HDFS, Hadoop YARN, and Spark on YARN sandbox image
./build.sh -a bigtop -o ubuntu-16.04 -c "hdfs, yarn, spark"
Build a Hadoop HDFS and HBase sandbox image
./build.sh -a bigtop -o ubuntu-16.04 -c "hdfs, hbase"
Use --dryrun to skip the build and get Dockerfile and configuration
./build.sh -a bigtop -o ubuntu-16.04 -c "hdfs, hbase" --dryrun
Change the repository of packages
export REPO=http://repos.bigtop.apache.org/releases/1.2.1/debian/8/x86_64
./build.sh -a bigtop -o ubuntu-16.04 -c "hdfs, yarn, ignite"
Customize your Big Data Stack
vim site.yaml.template.debian-8_hadoop # Configure your own stack
./build.sh -a bigtop -o debian-8 -f site.yaml.template.debian-8_hadoop -t my_hadoop_stack
Known issues
Fail to start daemons using systemd
Since systemd requires CAP_SYS_ADMIN, currently any OS using systemd can not successfully started up daemons during image build time.
Daemons can be brought up only if --privileged specified using docker run command.
Reference
Available Sandboxes: https://hub.docker.com/r/bigtop/sandbox/tags/
Build status: https://ci.bigtop.apache.org/view/Docker/job/Docker-Sandbox/
DataWorks Summit 2017 slide: https://www.slideshare.net/saintya/leveraging-docker-for-hadoop-build-automation-and-big-data-stack-provisioning