Ambari auto start for services and  components


Summary

This document describes the Ambari auto start feature before and after version 2.4.0.

Ambari auto start is a feature that enables certain components to be marked for auto start so that whenever a node restarts, ambari agent automatically restarts the stopped components. Auto start of a component is based on its current state and desired state.

Ambari 2.3.x/2.2.x (see here)

Auto start of services and components is supported via ambari.properties file using several properties. However, this approach is static - anytime auto start for a service component is required to be turned on or off, these properties in ambari.properties have to be modified and ambari server has to be restarted for the changes to go into effect. Moreover, ambari agent has to be restarted so that it can bootstrap with the server to get the auto start configuration.

Ambari 2.4.0+ (see here)

Auto start is dynamic. No restart of ambari server or ambari agent is required for any changes to take effect. All auto start properties reside in the database. API support has been added to configure the auto start setting for services and have ambari server communicate the changes to the ambari agents during the subsequent registration or heartbeat. Ambari web (UI) uses the APIs to dynamically control the auto start settings.

How auto start works in Ambari versions 2.3.x/2.2.x

When an ambari agent starts, it bootstraps with the ambari server via registration. The server sends information to the agent about the components that have been enabled for auto start along with the other auto start properties in ambari.properties. The agent compares the current state of these components against the desired state, to determine if these components are to be installed, started, restarted or stopped.

Ambari.properties

To enable components for auto start, specify them using recover.enabled_components=A,B,C


# Enable Metrics Collector auto-restart

recovery.type=AUTO_START

recovery.enabled_components=METRICS_COLLECTOR

recovery.lifetime_max_count=1024



Here’s a sample snippet of the auto start configuration that is sent to the agent by the server during agent registration:


"recoveryConfig": {

"type" : "AUTO_START",

"maxCount" : 10,

"windowInMinutes" : 60,

"retryGap" : 0,

"enabledComponents" : "a,b",

“disabledComponents”: “c,d”

}


For example, if the current state of METRICS_COLLECTOR component on a host is INSTALLED but it is enabled for auto start, the desired state is STARTED. The recovery manager generates a start command for METRICS_COLLECTOR which is executed by the controller.

Recovery scenarios

Depending on the value of recovery_type (DEFAULT, AUTO_START, FULL) attribute in ambari.properties file, the following recovery commands are supported. DEFAULT means auto start is disabled by default.


Summary of recovery_type values and state transitions


 

Attribute: recovery_type

Commands

State Transitions

AUTO_START

Start

INSTALLED → STARTED

FULL

Install, Start, Restart, Stop

INIT → INSTALLED, INIT → STARTED, INSTALLED → STARTED, STARTED → STARTED, STARTED → INSTALLED

DEFAULT

None

Auto start feature disabled

 


Detailed state transitions for various recovery_type values


 

Current state

Desired state

Recovery command

Recovery mode

Remarks

INSTALLED

STARTED

Start

AUTO_START

Start a component

INSTALLED

STARTED

Start

FULL

Start a component

INSTALLED

INSTALLED

Install

FULL

Stale component configurations.

INIT

STARTED

Install

FULL

Start a component

INIT

INSTALLED

Install

FULL

Install a component

STARTED

STARTED

Restart

FULL

Stale component configurations

STARTED

INSTALLED

Stop

FULL

Stop a component

 


How auto start works in Ambari version 2.4.0

Recovery scenarios

Please note that only Auto start recovery mode is supported, i.e., components that are in INSTALLED state can be transitioned to STARTED state. Ambari server sends the AUTO_START value for recovery type to the agent. Sample recovery configuration sent by the server to the agent:


"recoveryConfig": {

"type" : "AUTO_START",

"maxCount" : 10,

"windowInMinutes" : 60,

"retryGap" : 0,

"components" : "a,b",

"recoveryTimestamp" : 1458150424380

}


Enabling or disabling auto start feature from the UI:

  1. New RESTful APIs to capture the service and component names for auto start

  2. Support for multi instance services and components


Fresh installs and upgrades

In a fresh install, all services will be set to auto start by default. In upgrades this will not be the default. The user has to enable auto start via the UI.

Maintenance mode

Auto start will be ignored for host components which are in maintenance mode. A host component can be in maintenance mode due one or more of following reasons:

  • The host component was placed in maintenance mode

  • The host was placed in maintenance mode

  • The service was placed in maintenance mode

  • The cluster where the hosts belongs to was placed in maintenance mode.

Maintenance state of a component is got from the maintenance_state field in hostcomponentdesiredstate table:


 

cluster_id

host_id

service_name

component_name

maintenance_state

     

 


Auto start properties

Auto start setting is per service instance and stored in recovery_enabled field in servicecomponentdesiredstate table. However, all the other properties like recovery.type, recovery.lifetime_max_count, recovery.max_count, recovery.window_in_minutes, recovery.retry_interval will be global - applies to all service/component instances in that cluster and stored in the clusterconfig table for the cluster-env property. This is because having per service instance or component instance level setting will be too noisy with little or no benefit.


Persistence

Properties for auto start will be stored in the database. The idea is to use servicecomponentdesiredstate and clusterconfig table and distribute the information across these tables.

Blueprint based deployments

For blueprints for deployment (headless deployments). Blueprints do not have any room for specifying settings properties. Blueprint schema will have to be modified to accommodate settings. All components are auto started.


Specify a set or all of the components to be auto started. If it is a set, then explicitly call out the list of components. For all components, specify recovery_enabled="true" at the cluster level:

"settings" : [{
 "recovery_settings" : [{
"recovery_enabled" : "true"
}
]}
],
"Blueprints" : {
"stack_name" : "HDP",
"stack_version" : "2.5"
}
}



Specify METRICS_COLLECTOR as the default auto started component in both UI and blueprint, in the stack definition, with the ability for the blueprint authors to remove METRICS_COLLECTOR from getting auto start.


Blueprints can override the default list specified in the stack definition. During deployment, the servicecomponentdesiredstate table’s recovery_enabled field is set to true or false for each component.


Attributes will be stored in cluster-env.xml. Cluster-env.xml contains the following non-volatile properties:

  • recovery_type

  • recovery_lifetime_max_count

  • recovery_max_count

  • recovery_window_in_minutes

  • recovery_retry_interval

  • recovery_enabled


/var/lib/ambari-server/resources/stacks/HDP/<version>/configuration/cluster-env.xml


<configuration>

   <property>

       <name>recovery_type</name>

       <value>AUTO_START</value>

       <description>Recovery type</description>

   </property>

:

:

</configuration>


Enabling components for auto start

Components can be enabled for auto start by any of the following ways:

  1. Stack definition:

/var/lib/ambari-server/resources/common-services/<service_name>/<version>/metainfo.xml specifies whether a component  is enabled for auto start.


To enable a component for auto start in the stack definition, the XML snippet <recovery_enabled>true</recovery_enabled> should be specified. For example, to enable AMBARI_METRICS_COLLECTOR for auto start, it’s stack definition file common-services/AMBARI_METRICS/0.1.0/metainfo.xml should have the line in bold below:


<metainfo>

 <schemaVersion>2.0</schemaVersion>

 <services>

   <service>

     <name>AMBARI_METRICS</name>

     <displayName>Ambari Metrics</displayName>

     <version>0.1.0</version>

     <comment>A system for metrics collection that provides storage and retrieval capability for metrics collected from the cluster

     </comment>

     <components>

       <component>

         <name>METRICS_COLLECTOR</name>

         <displayName>Metrics Collector</displayName>

         <category>MASTER</category>

         <recovery_enabled>true</recovery_enabled>

  1. Blueprint definition:

When using blueprint deployments, the components specified in the blueprint JSON will override the ones specified in the stack definition.

   3. UI based deployments

Based on the stack definition, while deploying a cluster using the UI, the servicecomponentdesiredstate table’s new field recovery_enabled is updated by the backend with true/false based on whether the component is enabled or disabled for auto start.


Changes to the auto start value of one or more components is done from the UI. The changes will be updated in servicecomponentdesiredstate table (recovery_enabled column) which is the source of truth when the ambari server communicates with the ambari agent.

         

Blueprint schema
  • Use cluster-env section in the blueprint JSON to specify cluster specific auto start attributes.

  • JSON for enabling auto start:

"settings" : [ 
  { "recovery_settings" : [
    { "recovery_enabled" : "true" } ]}, 
  { "service_settings" : [ 
    { "name" : "HDFS", "recovery_enabled" : "false" }, 
    { "name" : "TEZ", "recovery_enabled" : "false" } ]}, 
  { "component_settings" : [ 
    { "name" : "DATANODE", "recovery_enabled" : "true" } ] }
 ]

 

  • Blueprint processor hands off this list to the deployment module so that servicecomponentdesiredstate table can be updated.


Component autostart hierarchy
  • Stack definition will contain the default list of components to be enabled or disabled.


  • Blueprint definition can use the cluster-env section to specify a list which will override the one specified in the stack definition.


  • UI will get it's list from the stack definition.


  • The backend will update the servicecomponentdesiredstate table with the list coming in from the UI or Blueprint.


Ambari Metric Service specific changes

METRICS_COLLECTOR component is set to auto start by default in ambari.properties in Ambari versions earlier to 2.4.0. In 2.4.0, this setting has been migrated to /var/lib/ambari-server/resources/common-services/AMBARI_METRICS/<version>/metainfo.xml with the <recovery_enabled>true</recovery_enabled> entry.


Backward compatibility

  1. Ambari.properties will be ignored. All values come from either the stack definition for UI based deployments or blueprint for blueprint based deployments. Cluster-env.xml or the cluster-env section of the blueprint supplies the auto start properties listed above.

  2. Pre-populate settings in the DB: The backend will populate the servicecomponentdesiredstate table with true/false values for various components during deployment - coming from the stack deployment or blueprint.

Communication

The ambari agent communicates with ambari server during registration (start up) and with periodic heartbeats. These are events when the server can send information to the agent when there are changes to the auto start property on services and components, giving an opportunity to the agent to apply those changes.

Registration

The server sends the following JSON to the agent during registration.


{

"recoveryConfig":

{

"type" : "AUTO_START",

"maxCount" : "5",

"windowInMinutes" : 20,

"retryGap" : 2,

"maxLifetimeCount" : 5,

"components”: “METRICS_COLLECTOR, OOZIE_SERVER”

}

}


The components member contains a list of components enabled for auto start and not in maintenance mode.

Heartbeat

If the auto start value for one or more components changes and/or the cluster-env level recovery properties change, the above JSON is constructed with the changed components and sent to the agent during the subsequent heartbeat.

Database

Cluster specific properties

The following cluster level properties will be stored under the cluster-env type in clusterconfig table as a JSON:


 

Property name

Value(s)

Description

recovery_type

DEFAULT, AUTO_START

DEFAULT: No auto start.

AUTO_START: auto start only.

recovery_lifetime_max_count

  

recovery_max_count

  

recovery_window_in_minutes

  

recovery_retry_interval

  

recovery_enabled

true, false

Cluster level recovery

 


Cluster config table:


 

cluster_id

type_name

version_tag

version

config_data

2

cluster-env

version1

1

{...,"recovery_lifetime_max_count":"1024","recovery

_max_count":"6","recovery_type":"AUTO_START",,"recovery_retry_interval":"5"}

 



The recovery_enabled value from clusterconfig overrides the value from servicecomponentdesiredstate for that cluster.

Service component specific properties

The servicecomponentdesiredstate table will be used to specify whether a component is enabled for auto start or not. Columns in bold are new. Existing attributes in ambari.properties are mapped to the new columns here.


recovery.disabled_components/recovery.enabled_components → recovery_enabled (boolean)


 

cluster_id

component_name

service_name

recovery_enabled

2

YARN_CLIENT

YARN

0

2

METRICS_COLLECTOR

AMBARI_METRICS

1

2

OOZIE_SERVER

OOZIE

1

 


REST API

Get auto-start flags of a cluster

Type: GET

Request: api/v1/clusters/<cluster_name>?fields=Clusters/desired_configs/cluster-env


{

 "href" : "http://c6404.ambari.apache.org:8080/api/v1/clusters/testcluster?fields=Clusters/desired_configs/cluster-env",

 "Clusters" : {

   "cluster_name" : "testcluster",

   "version" : "HDP-2.2",

   "desired_configs" : {

     "cluster-env" : {

       "tag" : "version1",

       "user" : "admin",

       "version" : 1

     }

   }

 }

}[


Type: GET

Request: api/v1/clusters/<cluster_name>/configurations?type=cluster-env&tag=version<xxx>

Example Response:

{

 href: "...",

 items: [

 {

  href: "...",

  tag: "version<xxx>",

  type: "cluster-env",

  version: 2,

  Config: {

   cluster_name: "c1",

   stack_id: "HDP-2.3"

  },

  properties: {

   fetch_nonlocal_groups: "true",

   ignore_groupsusers_create: "false",

   kerberos_domain: "EXAMPLE.COM",

   override_uid: "true",

   repo_suse_rhel_template: "...",

   repo_ubuntu_template: "{{package_type}} {{base_url}} {{components}}",

   security_enabled: "false",

   smokeuser: "ambari-qa",

   smokeuser_keytab: "/etc/security/keytabs/smokeuser.headless.keytab",

   user_group: "hadoop",

   recovery_enabled: “false”,

recovery_type: “AUTO_START”,

recovery_lifetime_max_count: 10,

recovery_max_count: 2,

recovery_window_in_minutes: 10,

recovery_retry_interval: 5000

  }

 }

 ]

}



Set auto-start flags of a cluster

Type: PUT

Request: api/v1/clusters/<cluster_name>

{

 Clusters: {

  desired_config: {

   tag: "version<xxx>",

  type: "cluster-env",

  properties: {

   fetch_nonlocal_groups: "true",

   ignore_groupsusers_create: "false",

   kerberos_domain: "EXAMPLE.COM",

   override_uid: "true",

   repo_suse_rhel_template: "...",

   repo_ubuntu_template: "...",

   security_enabled: "false",

   smokeuser: "ambari-qa",

   smokeuser_keytab: "...",

   user_group: "hadoop",

   recovery_enabled: “true”,

recovery_type: “AUTO_START”,

recovery_lifetime_max_count: 10,

recovery_max_count: 2,

recovery_window_in_minutes: 10,

recovery_retry_interval: 5000

  }

}

 }

}


Get auto-start flags of all components

Type: GET

Request: api/v1/clusters/<cluster_name>/components?fields=ServiceComponentInfo/component_name,ServiceComponentInfo/service_name,ServiceComponentInfo/category,ServiceComponentInfo/recovery_enabled

Success Response: 200 - application/json

Example Response

{

 href: "...",

 items: [

  {

   href: "...",

   ServiceComponentInfo: {

    category: "SLAVE",

cluster_name: "c1",

    component_name: "DATANODE",

    service_name: "HDFS",

    recovery_enabled: “true”

   }

  },

  {

   href: "...",

   ServiceComponentInfo: {

    category: "MASTER",

cluster_name: "c1",

    component_name: "NAMENODE",

    service_name: "HDFS",

    recovery_enabled: “true”

   }

  },

  {

   href: "...",

   ServiceComponentInfo: {

    category: "SLAVE",

cluster_name: "c1",

    component_name: "JOURNALNODE",

    service_name: "HDFS",

    recovery_enabled: “false”

   }

  }

 ]

}


Error Response: 400 - Bad Request

{

"status" : <status>,

"message" : <error message>

}


Set auto-start flags of all components


Type: PUT

Request 1: api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(<enabled_component_names>)

Request Params: application/json

{

 ServiceComponentInfo: {

  recovery_enabled: “true”

 }

}


Request 2:

api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(<disabled_component_names>)

Request Params: application/json

{

 ServiceComponentInfo: {

  recovery_enabled: “false”

 }

}


Success Response: 202 OK

Error Response: 400 - Bad Request

{

"status" : <status>,

"message" : <error message>

}


Request 3:

api/v1/clusters/testcluster/components/ZOOKEEPER_SERVER -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}'


Request 4:

api/v1/clusters/testcluster/components?ServiceComponentInfo/component_name=ZOOKEEPER_SERVER -d '{"ServiceComponentInfo" : {"recovery_enabled":"false"}}'


Request 5:

curl -u admin:admin -H "X-Requested-By: ambari" -X PUT 'http://c6401.ambari.apache.org:8080/api/v1/clusters/testcluster/components?ServiceComponentInfo/component_name.in(ZOOKEEPER_SERVER)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"false"}}'


Request 6:

curl -u admin:admin -H "X-Requested-By: ambari" -X PUT http://c6401.ambari.apache.org:8080/api/v1/clusters/testcluster/components -d '{"RequestInfo": {"query": "ServiceComponentInfo/component_name.in(ZOOKEEPER_CLIENT,ZOOKEEPER_SERVER)"} , "ServiceComponentInfo" : {"recovery_enabled":"true"}}'


  • No labels