The following instructions cover four main steps for installing Ambari and HDP using VirtualBox and Vagrant:
- Install VirtualBox and Vagrant. (Installation needs to be done only once, unless you want to upgrade VirtualBox and/or Vagrant.)
- Start one or more Linux virtual machines. Each machine represents a node in a cluster.
- On one of the virtual machines, download, install, and deploy the version of Ambari you wish to use.
- Using Ambari, deploy the version of HDP you wish to use.
Once VirtualBox and Vagrant are installed, steps 2 through 4 can be done multiple times to change versions, create a larger cluster, and so on.
Note: these steps have been tested on MacOS 10.11.6.
Terminology
A virtual machine, or VM, is a software program that exhibits the behavior of a separate computer and is capable of running applications and programs within its own environment.
A virtual machine is usually known as a guest. It runs within another computing environment, usually known as a host. Multiple virtual machines can exist within a single host at one time.
In the following examples, one or more virtual machines run on a host machine running OS X. OS X is the primary operating system. The virtual machines (guests) are installed under OS X. The virtual machines run Linux in separate environments on OS X. Thus, your Mac is the "host" machine, and the virtual machines that run Ambari and Hadoop are called "guest" machines.
Install VirtualBox and Vagrant
VirtualBox is a software virtualization package that installs on an operating system as an application. It allows you to run multiple virtual machines at the same time. In this Quick Start you will use VirtualBox to run Linux nodes within VirtualBox on OS X:
Vagrant is a tool that makes it easier to work with virtual machines. It helps automate the work of setting up, running, and removing virtual machine environments. Using Vagrant, you can install and run a preconfigured cluster environment with Ambari and the HDP stack.
- Download and install VirtualBox from https://www.virtualbox.org/wiki/Downloads. Note: as of 4/25/16 Vagrant doesn't work with the latest version of VirtualBox. We recommend installing an older (4.x) version of VirtualBox. This Quick Start has been tested on 4.3.34.
- Download and install Vagrant from https://www.vagrantup.com/downloads.html.
Clone the
ambari-vagrant
GitHub repository into a convenient folder on your Mac. Navigate to the folder, and enter the following command from the terminal:git clone https://github.com/u39kun/ambari-vagrant.git
The repository contains scripts for setting up Ambari virtual machines on several Linux distributions.
Add virtual machine hostnames and addresses to the
/etc/hosts
file on your computer. The following command copies a set of host names and addresses fromambari-vagrant/append-to-etc-hosts.txt
to the end of the/etc/hosts
files:sudo -s 'cat ambari-vagrant/append-to-etc-hosts.txt >> /etc/hosts'
Use the
vagrant
command to create a private key to use with Ambari:vagrant
The
vagrant
command displays Vagrant command information, and then it creates a private key in the file~/.vagrant.d/insecure_private_key
.
Start Linux Virtual Machines
The ambari-vagrant
directory (cloned from GitHub) contains several subdirectories, each for a specific Linux distribution. Each subdirectory has scripts and configuration files for running Ambari and HDP on that version of Linux.
To start one or more virtual machines:
Change your current directory to
ambari-vagrant
:cd ambari-vagrant
If you run an
ls
command on theambari-vagrant
directory, you will see subdirectories for several different operating systems and operating system versions.cd
into the OS subdirectory for the OS you wish to use. CentOS is recommended, because it is quicker to launch than other operating systems.
The remainder of this example uses CentOS 7.0 . (To install and use a different version or distribution of Linux, specify the other directory name in place ofcentos7.0
.)cd centos7.0
Important: All VM
vagrant
commands operate within your current directory. Be sure to run them from the local (Mac) subdirectory associated with the VM operating system that you have chosen to use. If you attempt to run avagrant
command from another directory, it will not find the VM.Copy the private key into the directory associated with the chosen operating system.
For this example, which usescentos7.0
, issue the following command:cp ~/.vagrant.d/insecure_private_key .
(Optional) If you have at least 16 GB of memory on your Mac, consider increasing the amount of memory allocated to the VMs.
Edit the following line inVagrantfile
, increasing allocated memory from 3072 to 4096 or more; for example:vb.customize ["modifyvm", :id, "--memory", 4096] # RAM allocated to each VM
- Every virtual machine will have a directory called
/vagrant
inside the VM. This corresponds to theambari-vagrant/<os>
directory on your local computer, making it easy to transfer files back and forth between your host Mac and the virtual machine. If you have any files to access from within the VM, you can place them in this shared directory. Start one or more VMs, using the
./up.sh
command. Each VM will run one HDP node. Recommendation: if you have at least 16GB of RAM on your Mac and wish to run a small cluster, start with three nodes../up.sh <# of VMs to launch>
Additional notes:
- The defaultVagrantfile
(in each OS subdirectory) can create up to 10 virtual machines.
- The fully-qualified domain name (FQDN) for each VM has the format<os-code>[01-10].ambari.apache.org
, where<os-code>
isc59
(CentOS 5.9),c64
(CentOS 6.4), etc. For example,c5901.ambari.apache.org
will be the FQDN for node 01 running CentOS 5.9.
- The IP address for each VM has the format192.168.<os-subnet>.1[01-10]
, where<os-subnet>
is64
for CentOS 6.4,70
for CentOS 7.0, and so on. For example,192.168.70.101
will be the IP address for CentOS 7.0 nodec7001
.
For example, the following command starts 3 VMs:./up.sh 3
For CentOS 7.0, the associated hostnames will bec7001
,c7002
, andc7003
. Note that theup.sh 3
command is equivalent tovagrant up c700{1..3}
.Check the status of your VM(s). The following example shows the results of
./upsh 3
for three VMs running with CentOS 7.0:LMBP:centos6.4 lkg$ vagrant status Current machine states: c7001 running (virtualbox) c7002 running (virtualbox) c7003 running (virtualbox) c7004 not created (virtualbox) c7005 not created (virtualbox) c7006 not created (virtualbox) c7007 not created (virtualbox) c7008 not created (virtualbox) c7009 not created (virtualbox) c7010 not created (virtualbox)
Your virtual machines are now installed and running.
Access Virtual Machines
Use the following steps when you want to access a running virtual machine:
To log on to a virtual machine, use the
vagrant ssh
command; for example:vagrant ssh c7001 LMBP:centos7.0 lkg$ vagrant ssh c7001 Last login: Tue Jan 12 11:20:28 2016 [vagrant@c7001 ~]$
From this point onward, this terminal window operates within the virtual machine until you exit the virtual machine. All commands go to the VM, not to your Mac.
Recommendation: Open a second terminal window for your Mac. This is useful when accessing the Ambari Web UI. To distinguish between the two, terminal windows typically list the computer name or VM hostname on each command-line prompt and at the top of the terminal window.When you first access the VM you will be logged in as user
vagrant
. Switch to theroot
user:[vagrant@c7001 ~]$ sudo su - Last login: Sun Sep 25 01:34:28 AEST 2016 on pts/0 root@c7001:~#
- When you are finished using the VM:
- Use the
logout
command to log out of root - Use the
exit
command to return to your host machine (Mac).
- Use the
At this point, the VMs are still running in the background. You can either suspend or remove the virtual machines; for more information, see the Vagrant and snapshot commands described later in this post. (Note: http://help.skytap.com/VM_Sequencing.html, and best practices?)
Install Ambari on the Virtual Machines
Prerequisites: Before installing Ambari, the following software packages must be installed on your VM:
rpm
curl
wget
pdsh
- ntpd?
- scp?
On CentOS: to check if a package is installed, run yum info <package-name>
. To install a package, run yum install <package-name>
.
To install Ambari, complete the following steps.
From the terminal window on the VM where you want to run the main Ambari service, download the Ambari repository. The following commands download Ambari version 2.4.1.0 and install
ambari-server
. To install a different version of Ambari, specify the appropriate repo URL. Choose the appropriate commands for the operating system on your VMs:OS Commands CentOS 6 wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.4.1.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
yum install ambari-server -yCentOS 7 wget -nv http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.4.1.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
yum install ambari-server -yUbuntu 12 wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu12/2.x/updates/2.4.1.0/ambari.list -O /etc/apt/sources.list.d/ambari.list
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
apt-get update
apt-get install ambari-server -yUbuntu 14 wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.4.1.0/ambari.list -O /etc/apt/sources.list.d/ambari.list
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
apt-get update
apt-get install ambari-server -yTo install Ambari with default settings, set up and start
ambari-server
:ambari-server setup -s ambari-server start
For more information about installation options and settings, see Apache Ambari Installation.
After Ambari Server has started, launch a browser on your host machine (Mac). Access the Ambari Web UI at
http://<hostname>.ambari.apache.org:8080
. The<hostname>
part of the URL specifies the VM where you installed Ambari; for example:http://c7001.ambari.apache.org:8080
Note: The Ambari Server can take some time to launch and be ready to accept connections. Keep trying the URL until you see the login page.
Login using default username
admin
, passwordadmin
.Choose "Launch Install Wizard."
Specify a name for your cluster.
On the Install Options page, list the FQDNs of the virtual machines. For example:
c7001.ambari.apache.org c7002.ambari.apache.org c7003.ambari.apache.org
Alternatively, you can use a range expression:
c70[01-03].ambari.apache.org
Upload the
insecure_private_key
file that you created earlier.Specify non-root SSH user
vagrant
.Continue stepping through Installation Wizard, completing onscreen instructions to install your cluster.
Install the HDP Stack
Next, install HDP on your cluster.
Choose which HDP version, Choose services, Assign master and slave processes, Customize services (here I assigned admin/admin to Hive and Oozie)
Timeline for installing HDP, and mac hardware
Troubleshooting
- yum failed? install yum on your VMs.
- Ambari message about THPs? fix transparent huge page setting in /etc/rc.loca (CentOS7)
- ntpd not installed?
Basic Vagrant Commands
The following table lists several common Vagrant commands. For more information, see Vagrant Command-Line Interface documentation.
Command | Description |
---|---|
vagrant up <vm-name> | Starts a specific VM. ( Example: Note: if you do not specify the |
vagrant status [<vm-name>] | Shows which VMs are running, suspended, etc. |
vagrant destroy -f [<vm-name>] | Destroys all VMs launched from the current directory, and deletes them from disk. Optional: Specify a specific VM to destroy. |
vagrant suspend [<vm-name>] | Suspends (snapshot) all VMs launched from the current directory so that you can resume them later Optional: Specify a specific VM to suspend. |
vagrant resume [<vm-name>] | Resumes all suspended VMs launched from the current directory Optional: Specify a specific VM to resume. |
vagrant ssh <vm-name> | Starts a SSH session to the host. Example: |
vagrant --help | List information about Vagrant commands. |
Taking Snapshots
A Vagrant snapshot saves the current state of a VM so that you can restart the VM from the same point at a future time. Vagrant makes it easy to take snapshots of the entire cluster.
Install the snapshot plugin:
vagrant plugin install vagrant-vbox-snapshot --plugin-version=0.0.2
This enables the “vagrant snapshot” command. Note that the above installs version 0.0.2, which allows you to take snapshots of the whole cluster at the same time. Later versions do not support this feature.
Run
vagrant snapshot
to see the syntax. Runvagrant snapshot <command> -h
for more information about a specific command. Here is a summary of commands:
vagrant snapshot back # restore most recent snapshot
vagrant snapshot delete <SNAPSHOT_NAME> # delete specified snapshot
vagrant snapshot go [vm-name] <SNAPSHOT_NAME> # restore specified snapshotvagrant snapshot list # list snapshots
vagrant snapshot take [vm-name] <SNAPSHOT_NAME> # take a snapshot, labeled by SNAPSHOT_NAME
The plugin attempts to take a snapshot of all VMs configured in Vagrantfile
. To avoid attempts to snapshot nonexistent VMs, comment out the nonexistent VMs in Vagrantfile
. For example, if you have three VMs running you can comment out c70[04-10] in Vagrantfile
so that the snapshot commands only operate on c70[01-03].
Note: Upon resuming a snapshot, you may find that time-sensitive services may be down (e.g, HBase RegionServer, etc.). If this happens, you will need to restart those services.
Recommendation: After you start the VMs--but before you run anything on the VMs--run vagrant snapshot take init
. This way, you can go back to the initial state of the VMs by running "vagrant snapshot go init"; this only takes seconds (much faster than starting the VMs up from scratch by using up.sh or "vagrant up"). Another advantage of this is that you can always go back to the initial state without destroying other named snapshots that you created.