Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


I have tried and put some troubleshooting tips which hopefully should help you get out of troubled waters. But, this has to be a collective community effort so in case you find some more info or something which is not already covered please add to it so that we can learn from each other. 

  1. Log in into ssvm - Log into the hypervisor and then type the following command   "ssh -i /opt/xensource/bin/id_rsa --p 3922 root@privateIP_or_LinkLocalIpofSSVM", or "ssh -i /root/.ssh/id_rsa.cloud -p 3922 root@LinkLocal" on XenServer.  Private ip in case of vmware and linklocal in case Xenserver.
  2. SSVM health check - Run the following script inside ssvm  /usr/local/cloud/systemvm/ssvm-check.sh
    It checks for 1)connectivity with  DNS server 2) resolving of  domain names 3)status of secondary storage 4)ability to write to secondary storage 5)connectivity with management server at port 8250 and 6) status of java process.
  3. Template not ready / not available when creating an instance - Many a times the SSVM is running but still the templates do not show as ready or to say templates are not available when creating an instance. Run the health check script above and diagnose. The most probable reason reason is that the agent running on SSVM hasn't been able to connect with MS which could also be validated by checking the host table in DB. select * from host where type like 'SecondaryStorageVM'. If the status shows as Alert then definitely that is the reason. There could be a number of reasons for the agent not being able to connect with MS. Below three could be one of them.
    1. Check whether port 8250 is open on MS and there is no firewall rule. This is the port on which the agent and MS communication happens.
    2. Check whether the SSVM is trying to connect to the right ip of MS. If it is incorrect it could be due to the wrong ip being set in the global settings (configuration table) for 'host' in MS. Change that, restart MS and SSVM and see if it solves the issue.
    3. Check the agent status on SSVM- See if the agent is running by typing "service cloud status" in SSVM. Try to run it and see if that's successful or changes the alert status.
  4. To check the state of templates whether is has downloaded or there is an error - Log into DB and check table template_host_ref and observe the download_state and error_string.
  5. Templates stuck in download in progress - Either stop and then start the SSVM. Or, run service cloud restart on the SSVM. You can also restart MS. This would trigger template sync which essentially will try and resume such stuck templates or redo the download of erred out templates
  6. Connection refused as the status for the template - Check whether the config parameter "secstorage.allowed.internal.sites" has been set to allow the internal n/w URL's.
  7. Retrying the download of templates - Try restarting MS / SSVM.
  8. no route to host - This error often implies your firewall blocks the traffic, check iptable rules in SSVM then host then physical firewall.
  9. SSVM Logs - /var/log/cloud/cloud.log
  10. SSVM nics - SSVM basically has four nics, they are:

eth0: link local nic used for ssh login from host
eth1: private nic used as management interface between mgmt server and SSVM
eth2: public nic used as interface that can reach outside internet
eth3: storage nic used as interface to access secondary storage share like NFS
CloudStack sets route for each nic, however, the most important route 'default' is set to public nic which is eth2.
That means a healthy SSVM should have default route like(by command 'ip route'):
default via public_gateway_ip_address dev eth2
this also implies communication between SSVMs happen thru public nic even both SSVMs are in the same private subnet.

  1. SSVM templates physical location - find the mount point by typing command "mount" . Go to the directory and under template/tmpl you will find all the templates.
  2. SSVM Apache server - For 2.2 onwards the system vms are debian based. Type "service apache2 status" to find the status. Apache root is at /www/html/
  3. Run script of java process /usr/local/cloud/systemvm/run.sh
  4. Increasing log level - 1) Edit the file /usr/local/cloud/systemvm/conf/log4j-cloud.xml 2) For the log file cloud.log change the threshold to info:  <param name="Threshold" value="WARN"/>  to  <param name="Threshold" value="INFO"/>  3) Change com.cloud to INFO:  <category name="com.cloud"> <priority value="INFO"/> </category>  If you're not getting sufficient logging, you can also try setting it to  DEBUG.
  5. Download Complete 100% but getting error like this Failed post download script: /usr/sbin/vhd-utilvhd tool check /mnt/SecStorage/33e2e9f5/template/tmpl/345/447/dnld1469110483936142751tmp_ failed - Many reasons for this but amongst them are wrong OS selection, vhd corruption.
    Test this in the lab by copying the template to one of the hosts then on that host run
     vhd-util check -n filename.vhd
     vhd-util scan filename.vhd
  6. SSVM RAM - Set the param secstorage.vm.ram.size to in change the ram size of the vm. Default in the code is 256.
  7. Allow multiple secondary storages feature has been added in the 2.2.x series. This would help in scaling the secondary storages for snapshots. The private templates are copied to one of the secondary storages and public to all of them. The template sync happens only for  public templates.
  8. For each secondary storage there is a corresponding row created in the host table. 
  9. HTTP Server returned 403 (expected 200 OK) - For copy templates. 
    Try to see the first log for this template initiation ? It should be logged with DownloadCommand and should have the url of the source ssvm's template. Then you can try going to the destination SSVM and try downloading that url. 
    See what issues you get. I would also check the iptable rules to see if the destination ssvm is blocked from accessing the source ssvm and also if there is any .htaccess file in the apache directories forbidding the download of template
    One of the problems as was as follows.
    The problem is that we're using basic networking & have the private network setup with the same gateway & subnet as the public network.  When the storage VM comes up the public network gets setup first but then when the private network comes up on eth2 it clobbers the gateway & sets it to the eth2 interface.  So when the copy is initiated between the storage VMs it happens across the private network but the /var/www/html/copy/.htaccess file only allows the public IP of the other SSVM, thus the 403 errors.