Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

AgentManager interface is our abstraction layer that manages agent connection and message passing to agents. It's responsibility is as follows:

  • Provide abstraction layer such that all logic above does not need to know the particulars about how messages are serialize/deserialize and how messages are routed before reaching the agent.
  • Provide hook-ins to inform other pieces of code of agent connection and disconnect.
  • Provide hook-ins to inform outside code of Commands and Answers that has passed through the system.
  • Determine the nature of an agent disconnect.
  • Work with ResourceManager to determine whether an agent is allowed to connect
  • Scales well with the management servers in the cluster.

Let's break down some nomenclature:
A ServerResource in CloudStack is a translation layer between CloudStack operations and how to perform that operation on the physical resource that it interacts with. Examples of ServerResource are XenServer hypervisor, VmWare hypervisor, KVM hypervisor, F5, SRX, NetScaler, etc. The requirement for a ServerResource is for it to map a Command from CloudStack into operations performed on the physical resource without any database work. It is required that any ServerResource do not access the database.

...

Agents are broken down into different types.

  • Connected Agent is an agent that connects to the management server via the port 8250. KVM ServerResource is often contained within a Connected Agent.
  • Direct Agent is an agent that runs within the management server. XenServer ServerResource is often contained within a Direct Agent.
  • Forwarding Agent is an agent that routes messages between management servers

Command and Answer is our pattern for message requests and responses. Each Command should have a corresponding Answer. I have seen code that skips that but that's wrong and should be corrected.

...

The CloudStack management server have two sources of load. Obviously, one source is the number of requests it gets via the web services api. That's outside the scope of this email but we can talk about how that works in a separate email. The other source is the number of resources the management server cluster has to manage. Our objective is to make sure that we can simply add management servers to scale with the number of resources it manages. The following ensures that. - Agent Load Balancing: As management servers are started and stopped, agent load balancing rebalances the number of agent each management server handles without interrupting the message passing.

  • One ServerResource can only be connected to one management server at one time.
  • WebService API requests are always executed by one management server and if the ServerResource needed is on a different management server, AgentManager is responsible for routing the message to right management server.
  • Background tasks and monitoring processes on each management server only deals with the ServerResource connected to that management server. It can be notified of agent connections and disconnects by registering a Listener with the AgentManager.

Breakdown of how it performs each of these tasks.

...

Management Server clustering
  • AgentManagerImpl is responsible for implementing the process of message sending, receiving, Connected Agents, and Direct Agents.
  • ClusteredAgentManagerImpl extends AgentManagerImpl by adding features that only can happen in clustered situations: agent rebalancing and message routing between management servers.
  • AgentManager registers for events on management server entering and leaving the cluster with the ClusterManager. While ClusterManager is not technically part of AgentManager, it is important to describe its functionality. ClusterManager is responsible for notifying interested code that a management server node is up or down. It is does this by writing a heartbeat into the database and if the heartbeat has not been updated by a certain interval, it will inform all the interested parties. AgentManager is one of those interested. Upon notification that a management server node has left the system, AgentManager will pick up the agents that were connected to that management server.
Implementation of the different types of agent handling
  • Connected Agents are handled by AgentManagerImpl.java, AgentHandler class embedded within AgentManagerImpl.java, and ConnectedAgentAttache.java. The tcp connection itself is handled by a NioConnection class.
  • Direct Agents are handled by AgentManagerImpl.java, DirectAgentAttache.java
  • Forwarding Agents are handled by ClusteredAgentManagerImpl.java, ClusteredAgentAttache.java
Notifying interested party of Agent Connection and Disconnect
  • Code interested in knowing if an agent is connected to the management server its running on can register a Listener with AgentManager. What it can listen for is in the Listener.java interface.
Managing the life cycle of a ServerResource
  • That's implemented by ResourceManagerImpl. ResourceManager is responsible for the life cycle of a ServerResource and the groupings of ServerResource (Cluster, Pod, Zone). Now life cycle of a ServerResource does impact Agent connections. For example, a ServerResource that is in removed state should never have agents connected on its behalf. AgentManager, upon a connection, talks with the ResourceManager to determine if the connection is allowed.
Determining the cause of an agent disconnect

...