This Quick Start guide is for readers who are new to the use of virtual machines, Apache Ambari, and/or the Apache Hadoop component stack, who would like to install and use a small local Hadoop cluster. The instructions are for a local host machine running OS X.
The following instructions cover four main steps for installing Ambari and HDP using VirtualBox and Vagrant:
...
Note: these steps were most recently tested on MacOS 10.11.6, El Capitan.
Table of Contents |
---|
Terminology
...
To log on to a virtual machine, use the
vagrant ssh
command on your host machine, and specify the hostname; for example:Code Block LMBP:centos7.0 lkg$ vagrant ssh c7001 Last login: Tue Jan 12 11:20:28 2016 [vagrant@c7001 ~]$
From this point onward, this terminal window is connected to the virtual machine until you exit the virtual machine. All commands go to the VM, not to your Mac.
Recommendation: Open a second terminal window for your Mac. This is useful when accessing the Ambari Web UI. To distinguish between the two, terminal windows typically list the computer name or VM hostname on each command-line prompt and at the top of the terminal window.When you first access the VM you will be logged in as user
vagrant
. Switch to theroot
user; be sure to include the space between "su" and "-":Code Block [vagrant@c7001 ~]$ sudo su - Last login: Sun Sep 25 01:34:28 AEST 2016 on pts/0 root@c7001:~#
...
rpm
curl
wget
pdsh
On CentOS: to check if a package is installed, run yum info <package-name>
. To install a package, run yum install <package-name>
.
To install Ambari, complete the following steps.
From the terminal window on the VM where you want to run the main Ambari service, download the Ambari repository. The following commands download Ambari version 2.4.1.0 and install ambari-server
. To install a different version of Ambari, specify the appropriate repo URL. Choose the appropriate commands for the operating system on your VMs:
you can build it yourself from source (see Ambari Development), or you can use published binaries.
As this is a Quick Start Guide to get you going quickly, ready-made publicly-available binaries are referenced. Note that these binaries were built and publicly made available via Hortonworks, a commercial vendor for Hadoop. This is for your convenience. Note that using the binaries shown here would make HDP, Hortonworks' distribution, available to be installed via Apache Ambari. The instructions here should still work (only the repo URLs need to be changed) if you have Ambari binaries from any other vendor/organization/individuals (the instructions here can be updated if anyone wanted to expand this to include such ready-made, publicly accessible binaries from any source - such contributions are welcome). This would also work if you had built the binaries yourself.
From the terminal window on the VM where you want to run the main Ambari service, download the Ambari repository. The following commands download Ambari version 2.5.1.0 and install ambari-server
. To install a different version of Ambari, specify the appropriate repo URL. Choose the appropriate commands for the operating system on your VMs:
Code Block |
---|
# CentOS 6 (for CentOS 7, replace centos6 with centos7 in the repo URL)
#
# to test public release 2.5.1
wget -O /etc/yum.repos.d/ambari.repo http://public-repo-1.hortonworks.com/ambari/centos6 |
...
/2.x/updates/2. |
...
5.1.0/ambari. |
...
repo yum install ambari- |
...
server -y # Ubuntu 14 (for Ubuntu 16, replace ubuntu14 with ubuntu16 in the repo URL) # to test public release 2.5.1 wget -O /etc/apt/sources.list.d/ambari.list http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2. |
...
5.1.0/ambari.list |
...
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD |
...
apt-get update |
...
apt-get install ambari-server |
...
On an early 2013 MacBook Pro, 2.7 GHz core i7 and 16 GB RAM, this step takes seven minutes. Timing also depends on internet download speeds.
To install Ambari with default settings, set up and start ambari-server
:
-y
# SUSE 11 (for SUSE 12, replace suse11 with suse12 in the repo URL)
# to test public release 2.5.1
wget -O /etc/zypp/repos.d/ambari.repo http://public-repo-1.hortonworks.com/ambari/suse11/2.x/updates/2.5.1.0/ambari.repo
zypper install ambari-server -y |
On an early 2013 MacBook Pro, 2.7 GHz core i7 and 16 GB RAM, this step takes seven minutes. Timing also depends on internet download speeds.
To install Ambari with default settings, set up and start ambari-server
:
Code Block |
---|
ambari-server |
...
setup -s
ambari-server start |
To check Ambari Server status, issue the following command:ambari-server status
After Ambari Server has started, launch a browser on your host machine (Mac). Access the Ambari Web UI at http://<hostname>.ambari.apache.org:8080
. The <hostname>
part of the URL specifies the VM where you installed Ambari; for example:
Code Block |
---|
http://c7001.ambari.apache.org:8080 |
Note: The Ambari Server can take some time to launch and be ready to accept connections. Keep trying the URL until you see the login page.
At this point, you can snapshot the VMs to have a cluster with Ambari installed, to rerun later if desired. This is especially helpful when installing Apache Ambari and the HDP stack for the first time; it allows you to back out to fresh VMs running Ambari, and reinstall a fresh HDP stack if you encounter errors. For more information about snapshots, see the vagrant snapshot
command in "Basic Vagrant Commands," later in this Quick Start.
Install the HDP Stack
The following instructions describe basic steps for using Ambari to install HDP components.
...
On the Host Checks window, the following warning indicates that you need to start ntpd
on each host:
To do this, start the services, for each VM navigate to a terminal window for each VM (from (on your Mac, vagrant ssh <VM-name>
), and issue . Issue the following commands:
service ntpd start
service ntpd status
...
Run the following commands under the root account on each VM:
yum remove -y snappy-1.1.0-3.el7.x86_64
yum install snappy-devel -y
Restarting Virtual Machines
on each VM:
yum remove -y snappy-1.1.0-3.el7.x86_64
yum install snappy-devel -y
Stopping and Restarting Virtual Machines
Hadoop is a complex ecosystem with a lot of status checks and cross-component messages. This can make it challenging to halt and restart several VMs and restore them later without warnings or errors.
Recommendations
If you would like to save state for a period of time and you plan to stop using your Mac during that time, if you sleep your Mac the cluster should continue from where it left off after you wake the Mac.
When stopping a set of VMs--if you don't need to save cluster state--it can be helpful to stop all services first, stop ambari-server (ambari-server stop
), and then issue a Vagrant halt
or suspend
command.
Hadoop is a complex ecosystem that generates constant status checks and messages among components. When restarting a cluster after halting or taking a snapshot, check the Ambari server status and restart it if necessary:
...
After logging into the Ambari Web UI, expect to see alert warnings or errors due to timeout conditions. Check the associated messages to determine whether they might affect your use of the virtual cluster. If so, it can be helpful to stop and restart one or more associated components.
Reference: Basic Vagrant Commands
...
More information: https://www.vagrantup.com/docs/getting-started/teardown.html
If you have favorite ways of starting and stopping VMs running a Hadoop cluster, please feel free to share them in the Comments section. Thanks!