CloudStack have not had a strong tradition of enforcing a exception and logging behaviour. However, do as we say and not as we do. Just because we weren't good at it doesn't mean you shouldn't. And we are working very hard to be good at it.  

Logging

CloudStack uses log4j. Yes, we could have use a number of logging facades out there. Yes, log4j is somewhat of an oldie but it is a goodie. Besides, what's really important is not the tool but the content (a recurring theme you'll find in CloudStack). CloudStack should be deployed with logging at INFO level or above and all logs should be stamped at GMT. However, CloudStack DOES NOT require restart to change logging levels. The following is a list of our logging levels and their suggested usage.  

Level

Use When

FATAL

This ship's sunk. Or the JVM has to die due to this.

ERROR

The system has hit a problem that it can not recover from. This error does not affect the general health of CloudStack but does error out for a particular request to CloudStack.

WARNING

The system has hit an problem that it thinks it can recover from but the admin should be aware so they can take a look at it.

INFO

The admin is interested in knowing this information (like the pilot announcing "Grand Canyon is to your right" on the flight)

DEBUG

Information that may be helpful to the admin in debugging a problem. The deciding factor here often is if an admin can reliably reproduce FATAL, ERROR, and WARNING condition, turning on DEBUG in logging should provide sufficient information about how they got to the error.

TRACE

Repetitive and annoying logs that really shouldn't be needed in normal debugging but may be useful as a last resort. Generally, the deciding factor on whether TRACE level is used is how fast this log can fill up the disk space if it is turned on.

Exception and Exception Handling

There are plenty of wisdom out on the internet regarding exceptions and handling. Here is some general anti-patterns and, on the bottom of that page, there are resources to other guidelines. There are a few that I like to single out as important.

  1. If you are writing entry point code, you are responsible for catching all exceptions, both Checked and Unchecked, and properly logging the error message and exception stack trace. What is an entry point? That's the point where a thread enters into our code base. For example, all API commands are entry points. If you are spinning off threads to do processing, the run() method of that thread is an entry point. If you are scheduling tasks to be run in a thread pool, that particularly task is an entry point. All of those code should be wrapped as follows.
    try {
        code...;
    } catch (Exception specific to your code) {
        Specific exception handling and logging...
    } catch (Exception e) {
        s_logger.warn("Caught unexpected exception", e);
        exception handling code...
    }
    
  2. If you are not writing entry point code, then it's fine to expect that code above yours will catch and log the exception. However, it is your responsibility to make sure that the stack trace of the exception is captured in the log. Don't ever catch and throw a new exception without either logging the exception or including it as the cause of the new exception.
    try {
        code...;
    } catch (XenAPIException e) {
        // Do either this: s_logger.warn("Caught a xen api exception", e);
        // or throw new CloudRuntimeException("Caught a xen api exception", e);
        // Don't ever do JUST this.
        throw new CloudRuntimeException("Got a xen api exception"); 
    }
    
  3. Don't ever declare a method to throw Exception. This may seem like a nice and quick easy way to handle exceptions but it forces the caller methods to catch Exception which then hides all other checked Exceptions in other parts of the code. Take for instance:
    public void irresponsibleMethod() throws Exception;
    public void responsibleMethod() throws XenAPIException;
    public void runtimeExceptMethod(); // throws CloudRuntimeException that's not suppose to be logged until entry point.
    public void innocentCaller() {
        try {
            irresponsibleMethod();
            responsibleMethod();
            runtimeExceptionMethod();
        } catch(Exception e) {
            s_logger.warn("Unable to execute", e);
            throw new CloudRuntimeException("Unable to execute", e);
            // What's wrong here?
            // 1. If the error was thrown from responsibleMethod, the caller now forgot to do special handling for XenAPIException.
            // 2. If the error was thrown from runtimeExceptionMethod, the caller now log it once here, and will log again at entry point.
        }
    }
    
  4. Don't ever throw Exception itself. If you need a checked Exception, either find one that fits your needs or create one yourself. If what you run into shouldn't be possible or is due to programmer mistake, then throw CloudRuntimeException. To decide if you need a CloudRuntimeException, ask yourself this, is this similar to hitting a null pointer? NullPointerException is a runtime exception because if the caller wanted to handle the pointer being null situation, they would have handled it before calling. Checked exceptions should be thrown if and only if the caller has a reasonable chance of handling the exception other than log and report error. Prefer CloudRuntimeException unless you have a good reason to throw a checked exception. Note the words "programmer mistake" here. User errors should be handled properly. 
  5. If you have to do some error handling for an exception, don't throw a new exception, rethrow the original one. Rethrowing the original one allows the correct stack trace to be logged.
    try {
        some code;
    } catch(XenAPIException e) {
        // catch generic error here.
        s_logger.debug("There's an exception.  Rolling back code: " + e.getMessage());
        ...rollback some code;
        throw e; // note there's no "new" here.
    }
    
  6. If you have a background thread processing a list of equal items, it is important that the processing of each item includes a try-catch loop. If you don't and if there is any exception in processing one of the items, the items that are not processed yet will stop processing. This can have disastrous consequences as the background thread can keep coming back to the same list of items but every item after the item with the exception will never get processed.
    for (Task task : taskList) {
        try {
            process task;
        } catch (Exception e) {
            ...handle exception and continue
        }
    }
    

CloudStack Exceptions

CloudStack do have a list of well known exceptions and there are some exceptions are important to describe here.

Exception

Thrown By

Purpose

Usage

CloudRuntimeException

everyone

An error has been hit that cannot be handled.

When using this exception, it is best to pack as much debugging information into the message as possible

ResourceUnvailableException

components that deal with resource allocation.

To serve as a parent class for when a physical resource is unusable when CloudStack wants to use it.

This exception must be thrown with the scope set in the exception. The scope tells the caller above whether this exception affects a host, storage pool, cluster, pod, or zone. The caller can then decide if it can retry.

InsufficientCapacityException

components that deal with resource allocation.

To serve as a parent class for when a physical resource is out of capacity when CloudStack wants to use it.

This exception must be thrown with the scope set in the exception. The scope tells the caller above whether this exception affects a host, storage pool, cluster, pod, or zone. The caller can then decide if it can retry.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

There is also a good reference to CloudStack exceptions and error codes here.

  • No labels