Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I have an HDP 2.0 cluster with Ambari Server running on c6401.ambari.apache.org. In that cluster, I have a host (c6404.ambari.apache.org) that is hosting DataNode, NodeManager, RegionServer and Ganglia Monitor components. All is well, Agent is alive and heartbeating, components are running, no alerts are present.

Image Modified

I lose my host (hardware failure)! Nagios starts alerting because components on that host are no longer running, and we lose the Agent heartbeat. Ambari Web shows the alerts and that the heartbeat is lost.

Image Modified

I repair + rebuild the machine and use the SAME hostname c6404.ambari.apache.org. I install the Ambari Agent on the machine, set the Ambari Server to point c6401.ambari.apache.org and start the agent. Ambari picks-up the heartbeat and starts reporting that the components are “not running”.

Image Modified

 

I need to get Ambari to re-install the component packages on the machine so I can restart the components and get back into a good state.

...

  1. From the “actions” menu next to the component, select “Delete”.
     
    Image Modified
     
  2. Click the “+ Add“ button and you’ll see the component is listed. Select the component to install.
    Image Modified
  3. The component will be installed. Once complete, select “Start” and you are back in business. 

...