Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This Quick Start guide is for readers who are new to the use of virtual machines, Apache Ambari, and/or the Apache Hadoop component stack, who would like to install and use a small local Hadoop cluster. The instructions are for a local host machine running OS X.

The following instructions cover four main steps for installing Ambari and HDP using VirtualBox and Vagrant:

...

Note: these steps were most recently tested on MacOS 10.11.6, El Capitan.

Table of Contents

Terminology

...

  1. Download and install VirtualBox from https://www.virtualbox.org/wiki/Downloads. Note: as of 4/25/16 Vagrant doesn't work with the latest version of VirtualBox. We recommend installing an older (4.x) version of VirtualBox. This Quick Start has been tested on 4.3.34.This Quick Start has been tested on version 5.1.6.

  2. Download and install Vagrant from https://www.vagrantup.com/downloads.html.
  3. Clone the ambari-vagrant GitHub repository into a convenient folder on your Mac. Navigate to the folder, and enter the following command from the terminal:

    Code Block
    git clone https://github.com/u39kun/ambari-vagrant.git
    

    The repository contains scripts for setting up Ambari virtual machines on several Linux distributions.

  4. Add virtual machine hostnames and addresses to the /etc/hosts file on your computer. The following command copies a set of host names and addresses from ambari-vagrant/append-to-etc-hosts.txt to the end of the /etc/hosts files:

    Code Block
    sudo -s 'cat ambari-vagrant/append-to-etc-hosts.txt >> /etc/hosts'
    
  5. Use the vagrant command to create a private key to use with Ambari:

    Code Block
    vagrant
    

    The vagrant command displays Vagrant command information, and then it creates a private key in the file ~/.vagrant.d/insecure_private_key.

...

  1. To log on to a virtual machine, use the vagrant ssh command on your host machine, and specify the hostname; for example:

    Code Block
    LMBP:centos7.0 lkg$ vagrant ssh c7001
    
    Last login: Tue Jan 12 11:20:28 2016
    [vagrant@c7001 ~]$  

    From this point onward, this terminal window is connected to the virtual machine until you exit the virtual machine. All commands go to the VM, not to your Mac.

    Recommendation: Open a second terminal window for your Mac. This is useful when accessing the Ambari Web UI. To distinguish between the two, terminal windows typically list the computer name or VM hostname on each command-line prompt and at the top of the terminal window.

  2. When you first access the VM you will be logged in as user vagrant. Switch to the root user; be sure to include the space between "su" and "-":

    Code Block
    [vagrant@c7001 ~]$ sudo su -
    
    Last login: Sun Sep 25 01:34:28 AEST 2016 on pts/0
    root@c7001:~#  

...

  • rpm

  • curl

  • wget

  • pdsh 

On CentOS: to check if a package is installed, run yum info <package-name>. To install a package, run yum install <package-name>.


To install Ambari, complete the following steps.

From the terminal window on the VM where you want to run the main Ambari service, download the Ambari repository. The following commands download Ambari version 2.4.1.0 and install ambari-server. To install a different version of Ambari, specify the appropriate repo URL. Choose the appropriate commands for the operating system on your VMs:

you can build it yourself from source (see Ambari Development), or you can use published binaries.

As this is a Quick Start Guide to get you going quickly, ready-made publicly-available binaries are referenced. Note that these binaries were built and publicly made available via Hortonworks, a commercial vendor for Hadoop.  This is for your convenience.  Note that using the binaries shown here would make HDP, Hortonworks' distribution, available to be installed via Apache Ambari.  The instructions here should still work (only the repo URLs need to be changed) if you have Ambari binaries from any other vendor/organization/individuals (the instructions here can be updated if anyone wanted to expand this to include such ready-made, publicly accessible binaries from any source - such contributions are welcome).  This would also work if you had built the binaries yourself.

From the terminal window on the VM where you want to run the main Ambari service, download the Ambari repository. The following commands download Ambari version 2.5.1.0 and install ambari-serverTo install a different version of Ambari, specify the appropriate repo URL. Choose the appropriate commands for the operating system on your VMs: 

Code Block
# CentOS 6 (for CentOS 7, replace centos6 with centos7 in the repo URL)
# 
# to test public release 2.5.1
wget -O 

...

/etc/yum.repos.d/ambari.repo

...

 

...

http://public-repo-1.hortonworks.com/ambari/

...

centos6/2.x/updates/2.

...

5.1.0/ambari.

...

repo
yum install ambari-

...

server -y
 
# Ubuntu 14 (for Ubuntu 16, replace ubuntu14 with ubuntu16 in the repo URL)
# to test public release 2.5.1
wget -O /etc/apt/sources.list.d/ambari.list

...

 http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.5.1.0/ambari.list
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD

...

apt-get update

...

apt-get install ambari-server -y

...


 
# SUSE 11 (for SUSE 12, replace suse11 with suse12 in the repo URL)
# to test public release 2.5.1
wget -O /etc/zypp/repos.d/ambari.repo http://public-repo-1.hortonworks.com/ambari/

...

suse11/2.x/updates/2.

...

5.1.0/ambari.

...

repo
zypper install ambari-server -y

On an early 2013 MacBook Pro, 2.7 GHz core i7 and 16 GB RAM, this step takes seven minutes. Timing also depends on internet download speeds.

 

To install Ambari with default settings, set up and start ambari-server:

Code Block
ambari-server setup -s
ambari-server start

 

To check Ambari Server status, issue the following command:
ambari-server status

After Ambari Server has started, launch a browser on your host machine (Mac). Access the Ambari Web UI at http://<hostname>.ambari.apache.org:8080. The <hostname> part of the URL specifies the VM where you installed Ambari; for example:

Code Block
http://c7001.ambari.apache.org:8080

 Note: The Ambari Server can take some time to launch and be ready to accept connections. Keep trying the URL until you see the login page.

At this point, you can snapshot the VMs to have a cluster with Ambari installed, to rerun later if desired. This is especially helpful when installing Apache Ambari and the HDP stack for the first time; it allows you to back out to fresh VMs running Ambari, and reinstall a fresh HDP stack if you encounter errors. For more information about snapshots, see the vagrant snapshot command in "Basic Vagrant Commands," later in this Quick Start.

Install the HDP Stack

The following instructions describe basic steps for using Ambari to install HDP components.

...

On the Host Checks window, the following warning indicates that you need to start ntpd on each host:


To do this, start the services, for each VM navigate to a terminal window for each VM (from on your Mac, vagrant ssh <VM-name>), and issue . Issue the following commands:

service ntpd start
service ntpd status

...

Run the following commands under the root account on each VM:

yum remove -y snappy-1.1.0-3.el7.x86_64
yum install snappy
-devel -y

Restarting Virtual Machines

on each VM:

yum remove -y snappy-1.1.0-3.el7.x86_64
yum install snappy
-devel -y

Stopping and Restarting Virtual Machines

Hadoop is a complex ecosystem with a lot of status checks and cross-component messages. This can make it challenging to halt and restart several VMs and restore them later without warnings or errors.

Recommendations

If you would like to save state for a period of time and you plan to stop using your Mac during that time, if you sleep your Mac the cluster should continue from where it left off after you wake the Mac.

When stopping a set of VMs--if you don't need to save cluster state--it can be helpful to stop all services first, stop ambari-server (ambari-server stop), and then issue a Vagrant halt or suspend command.

Hadoop is a complex ecosystem that generates constant status checks and messages among components. When restarting a cluster after halting or taking a snapshot, check the Ambari server status and restart it if necessary:

...

After logging into the Ambari Web UI, expect to see alert warnings or errors due to timeout conditions. Check the associated messages to determine whether they might affect your use of the virtual cluster. If so, it can be helpful to stop and restart one or more associated components.

Reference: Basic Vagrant Commands

...

More information: https://www.vagrantup.com/docs/getting-started/teardown.html

 

If you have favorite ways of starting and stopping VMs running a Hadoop cluster, please feel free to share them in the Comments section. Thanks!