API to extract the log statements by jobid

Bug Reference

https://issues.apache.org/jira/browse/CLOUDSTACK-3471

Branch

master

Introduction

As of cloudstack 4.3 there is no api that can aggregate log messages by the job id. An api to extract logs by the jobid would make it easier to identify the sequence of steps that have been executed to complete a particular job. In case of failures it would aid in quickly identifying the associated commands/steps that have resulted in the failure.of cloudstack 4.3 there is no api that can aggregate log messages by the job id. An api to extract logs by the jobid would make it easier to identify the sequence of steps that have been executed to complete a particular job. In case of failures it would aid in quickly identifying the associated commands/steps that have resulted in the failure.

Purpose

In terms of the functionality available to end users, this will provide a cloudstack api called extractLogsByJobid() which will be available only as a ROOT admin API.

References

Document History

Glossary

Feature Specifications

Use cases
put the relevant use case/stories to explain how the feature is going to be used/work

Architecture and Design description

The system will comprise of a log shipping layer. This layer will be responsible for collecting logs from each of the management server shipping them to a centralized place. In the current design we are proposing logstash as the shipping layer. It will be configured to use rabbitmq to ship individual log files to a centralized location.

The shipping phase will interact with another layer called the indexer/search layer. This layer will also enable storing the logs in a format that will help in writing search queries. In the current implementation we are proposing use of logstash to recieve the individual log files and elasticsearch to search through these. Before logstash outputs the recieved messages to elasticsearch, it will apply a specific grok filter that will split the input messages into key value pairs. The key value pair will allow creation of search queries by (key,value). Via the elasticsearch REST api , search queries can be constructed for required jobid.

Instances of Logstash:

Logstash can aggregate log messages from multiple nodes and multiple log files. In a typical production environment, cloudstack is configured with multiple management server instances for scalability and redundancy. One instance of logstash will be configured to run on each of the management server which will ship the log to a AMQP broker. The logstash process is reasonably light in terms of memory consumption and should not impact the management server.

Instances of elasticsearch and AMQP broker:

Elasticsearch runs as a horizontal scale out cluster. The clusters node can be created in two different modes.

Same as Management Server: Each management server will be configured to run as a elasticsearch node. This particular configuration though simple can impact the management server as the elasticsearch service can be memory and cpu intensive. One of the management servers will be configured as the master node.
Seperate Elasticsearch nodes: In this configuration the systemvm template can be used to spawn elasticsearch nodes. The number of such nodes should be configurable via a global parameter. One of the node will be designated as the master node.

TODO: Details on using systemvm for elasticsearch nodes.

Logstash Configuration on the log shipping layer.

input {
file {

type => "apache"

path => [ "/var/log/cloudstack/management/management-server.log" ]
}
}
output {
amqp {
host => "myamqpserver"
exchange_type => "fanout"
name => "rawlogs"
}
}

Logstash configuration on the index/search layer.

input {
amqp {
type => "all"
host => "<host>"
exchange => "rawlogs"
name => "<name>"
}
}

filter {
grok {
type => "apache"
pattern => "%{YEAR}%{MONTHNUM}%{MONTHDAY}[T ]%{HOUR}\:?%{MINUTE}\:?%{SECOND}[T ]INFO%{GREEDYDATA}job[-]+%{INT\:jobid}\s*=\s*[\s*%{UUID\:uuid}\s*
]%{GREEDYDATA}"
}
}

output {
elasticsearch {
host => "<elasticsearch_master>"
}
}

API Command :

A new API command ExtractLogByJobIdCmd will be introduced.

Manager:

The manager class will implement the actual functionality of querying elastic search for log messages that match the specified filters. For doing this the Elasticsearch REST api queries will be used. Post method will be used with elasticsearch DSL to specify the required query. DSL is quite flexible and in future if support is required to filter by time stamp and other values DSL would help achieve that with ease.
DSL query for searching logs by jobid

{
"query": {
"query_string": {
"query": "\<jobid\>",
"fields" : "jobid"
}
}
}
Web Services APIs
A new API has been introduced which can be accessed as

http://<host>:8080/client/api?command=extractLogByJobId&jobid=<jobid>

UI flowNone
IP Clearance* what dependencies will you be adding to the project?* are you expecting to include any code developed outside the Apache CloudStack project?AppendixAppendix A:Appendix B: Labels:

Space shortcuts

Child pages