Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
minLevel
 
 
2

...


Summary

As of Ambari 2.0,

...

Table of Contents
minLevel2

Summary

Ambari 2.0 added support to the Blueprint functionality which allows for deploying High-Availability (HA) for certain components.

 

Prior to this functionality, configuring HA required manually using the Ambari Web HA Wizards after deployment of the cluster.

 

The functionality supports these components:
  • HDFS NameNode HA
  • YARN ResourceManager HA
  • HBase RegionServers HA

Support may be added for other Hadoop technologies in later releases.  

Compatibility with Ambari UI

...

Blueprints are able to deploy the following components with HA:

  •  HDFS NameNode HA
  • YARN ResourceManager HA
  • HBase RegionServers HA

 

As of Ambari 2.1, Blueprints are able to deploy the following components with HA:


This functionality currently requires providing fine-grained configurations. This document provides examples.

FAQ

Compatibility with Ambari UI 

While this feature does not require the Ambari UI to function, the Blueprints HA feature is completely compatible with the Ambari UI.  An HA cluster created via Blueprints can be monitored and configured via the Ambari UI, just as any other Blueprints cluster would function.  

...

Expert-Mode Configuration

In Ambari 2.0, the Blueprint support for HA requires the Blueprint to contain exact fine-grained configurations. See the examples below for more detail.

In future releases, we hope to provide a higher-level mode of operations, so that HA can be enabled in a more coarse-grained way.  

Supported Stack Versions

This feature is enabled for the HDP 2.1 stack, as well as future versions of the stack.  Previous versions of HDP have not been verified for this feature, and may not function as desired.  In addition, earlier HDP versions may not include the HA support for the required technology.  

Getting Started with Blueprints HA

Start with the tested & working examples below and customize from there.

Blueprint Example: HDFS NameNode HA Cluster

Summary

HDFS NameNode HA allows a cluster to be configured such that a NameNode is not a single point of failure.
For more details on HDFS NameNode HA see the Apache Hadoop documentation.

In an Ambari-deployed HDFS NameNode HA cluster:

...

  

Supported Stack Versions

This feature is supported as of HDP 2.1 and newer releases. Previous versions of HDP have not been tested with this feature.  

Examples

Blueprint Example: HDFS NameNode HA Cluster

HDFS NameNode HA allows a cluster to be configured such that a NameNode is not a single point of failure.

For more details on HDFS NameNode HA see the Apache Hadoop documentation.

In an Ambari-deployed HDFS NameNode HA cluster:

  • 2 NameNodes are deployed: an “active” and a “passive” namenode.
  • If the active NameNode should stop functioning properly, the passive node’s Zookeeper client will detect this case, and the passive node will become the new active node.
  • HDFS relies on Zookeeper to manage the details of failover in these cases.
  • The Blueprints HA feature will automatically invoke all required commands and setup steps for an HDFS NameNode HA cluster, provided that the correct configuration is provided in the Blueprint.  The shared edit logs of each NameNode are managed by the Quorum Journal Manager, rather than NFS shared storage.  The use of NFS shared storage in an HDFS HA setup is not supported by Ambari Blueprints, and is generally not encouraged.  

How

The Blueprints HA feature will automatically invoke all required commands and setup steps for an HDFS NameNode HA cluster, provided that the correct configuration is provided in the Blueprint.  The shared edit logs of each NameNode are managed by the Quorum Journal Manager, rather than NFS shared storage.  The use of NFS shared storage in an HDFS HA setup is not supported by Ambari Blueprints, and is generally not encouraged.  

By setting a series of properties in the “hdfs-site” configuration file, a user can configure HDFS NameNode HA to use at most two NameNodes in a cluster.  These NameNodes are typically referenced via a logical name, the “nameservice”.

 

Note that 

How

The Blueprints HA feature will automatically invoke all required commands and setup steps for an HDFS NameNode HA cluster, provided that the correct configuration is provided in the Blueprint.  The shared edit logs of each NameNode are managed by the Quorum Journal Manager, rather than NFS shared storage.  The use of NFS shared storage in an HDFS HA setup is not supported by Ambari Blueprints, and is generally not encouraged.  

The following HDFS stack components should be included in any host group in a Blueprint that supports an HA HDFS NameNode:

  1. NAMENODE

  2. ZKFC

  3. ZOOKEEPER_SERVER

  4. JOURNALNODE

Configuring Active and Standby NameNodes

The HDFS “NAMENODE” component must be assigned to two servers, either via two separate host groups, or to a host group that maps to two physical servers in the Cluster Creation Template for this cluster.  

By default, the Blueprint processor will assign the “active” NameNode to one host, and the “standby” NameNode to another.  The user of an HA Blueprint does not need to configure the initial status of each NameNode, since this can be assigned automatically.  

If desired, the user can configure the initial state of each NameNode by adding the following configuration properties in the “hadoop-env” namespace:

  1. dfs_ha_initial_namenode_active - This property should contain the hostname for the “active” NameNode in this cluster.

  2. dfs_ha_initial_namenode_standby - This property should contain the host name for the “passive” NameNode in this cluster.

 

The following HDFS stack components should be included in any host group in a Blueprint that supports an HA HDFS NameNode:

  1. NAMENODE

  2. ZKFC

  3. ZOOKEEPER_SERVER

  4. JOURNALNODE

Configuring Active and Standby NameNodes

The HDFS “NAMENODE” component must be assigned to two servers, either via two separate host groups, or to a host group that maps to two physical servers in the Cluster Creation Template for this cluster.  

By default, the Blueprint processor will assign the “active” NameNode to one host, and the “standby” NameNode to another.  The user of an HA Blueprint does not need to configure the initial status of each NameNode, since this can be assigned automatically.  

If desired, the user can configure the initial state of each NameNode by adding the following configuration properties in the “hadoop-env” namespace:

  1. dfs_ha_initial_namenode_active - This property should contain the hostname for the “active” NameNode in this cluster.

  2. dfs_ha_initial_namenode_standby - This property should contain the host name for the “passive” NameNode in this cluster.

 


Note

These properties should only be used when the initial state of the active or standby

Note

These properties should only be used when the initial state of the active or standby NameNodes needs to be configured to a specific node.  This setting is only guaranteed to be accurate in the initial state of the cluster.  Over time, the active/standby state of each NameNode may change as failover events occur in the cluster. 

The active or standby status of a NameNode is not recorded or expressed when an HDFS HA Cluster is being exported to a Blueprint, using the Blueprint REST API endpoint.  Since clusters change over time, this state is only accurate in the initial startup of the cluster.  

Generally, it is assumed that most users will not need to choose the active or standby status of each NameNode, so the default behavior in Blueprints HA is to assign the status of each node automatically.  

...

Blueprint Example: Yarn ResourceManager HA Cluster

Summary

Yarn ResourceManager High Availability (HA) adds support for deploying two Yarn ResourceManagers in a given Yarn cluster.  This support removes the single point of failure that occurs when single ResourceManager is used.  

...

The following link includes an example Blueprint for a 3-node Yarn ResourceManager HA Cluster:

yarn_ha_blueprint.json

 

Code Block
{
  "Blueprints": {
    "stack_name": "HDP",
    "stack_version": "2.2"
  },
  "host_groups": [
    {
      "name": "gateway",
      "cardinality" : "1",
      "components": [
        { "name": "HDFS_CLIENT" },
        { "name": "MAPREDUCE2_CLIENT" },
        { "name": "METRICS_COLLECTOR" },
        { "name": "METRICS_MONITOR" },
        { "name": "TEZ_CLIENT" },
        { "name": "YARN_CLIENT" },
        { "name": "ZOOKEEPER_CLIENT" }
      ]
    },
    {
      "name": "master_1",
      "cardinality" : "1",
      "components": [
        { "name": "HISTORYSERVER" },
        { "name": "JOURNALNODE" },
        { "name": "METRICS_MONITOR" },
        { "name": "NAMENODE" },
        { "name": "ZOOKEEPER_SERVER" }
      ]
    },
    {
      "name": "master_2",
      "cardinality" : "1",
      "components": [
        { "name": "APP_TIMELINE_SERVER" },
        { "name": "JOURNALNODE" },
        { "name": "METRICS_MONITOR" },
        { "name": "RESOURCEMANAGER" },
        { "name": "ZOOKEEPER_SERVER" }
      ]
    },
    {
      "name": "master_3",
      "cardinality" : "1",
      "components": [
        { "name": "JOURNALNODE" },
        { "name": "METRICS_MONITOR" },
        { "name": "RESOURCEMANAGER" },
        { "name": "SECONDARY_NAMENODE" },
        { "name": "ZOOKEEPER_SERVER" }
      ]
    },
    {
      "name": "slave_1",
      "components": [
        { "name": "DATANODE" },
        { "name": "METRICS_MONITOR" },
        { "name": "NODEMANAGER" }
      ]
    }
  ],
  "configurations": [
    {
      "core-site": {
        "properties" : {
          "fs.defaultFS" : "hdfs://%HOSTGROUP::master_1%:8020"
      }}
    },{
      "yarn-site" : {
        "properties" : {
          "hadoop.registry.rm.enabled" : "false",
          "hadoop.registry.zk.quorum" : "%HOSTGROUP::master_3%:2181,%HOSTGROUP::master_2%:2181,%HOSTGROUP::master_1%:2181",
          "yarn.log.server.url" : "http://%HOSTGROUP::master_2%:19888/jobhistory/logs",
          "yarn.resourcemanager.address" : "%HOSTGROUP::master_2%:8050",
          "yarn.resourcemanager.admin.address" : "%HOSTGROUP::master_2%:8141",
          "yarn.resourcemanager.cluster-id" : "yarn-cluster",
          "yarn.resourcemanager.ha.automatic-failover.zk-base-path" : "/yarn-leader-election",
          "yarn.resourcemanager.ha.enabled" : "true",
          "yarn.resourcemanager.ha.rm-ids" : "rm1,rm2",
          "yarn.resourcemanager.hostname" : "%HOSTGROUP::master_2%",
          "yarn.resourcemanager.recovery.enabled" : "true",
          "yarn.resourcemanager.resource-tracker.address" : "%HOSTGROUP::master_2%:8025",
          "yarn.resourcemanager.scheduler.address" : "%HOSTGROUP::master_2%:8030",
          "yarn.resourcemanager.store.class" : "org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore",
          "yarn.resourcemanager.webapp.address" : "%HOSTGROUP::master_2%:8088",
          "yarn.resourcemanager.webapp.https.address" : "%HOSTGROUP::master_2%:8090",
          "yarn.timeline-service.address" : "%HOSTGROUP::master_2%:10200",
          "yarn.timeline-service.webapp.address" : "%HOSTGROUP::master_2%:8188",
          "yarn.timeline-service.webapp.https.address" : "%HOSTGROUP::master_2%:8190"
        }
      }
    }
  ]
}


Register Blueprint with Ambari Server

...

Blueprint Example: HBase RegionServer HA Cluster

Summary


HBase provides a High Availability feature for reads across HBase Region Servers.  

...