Apache Airavata

If you want to enable elasticsearch based logging feature for Airavata

Centralized logging for Airavata

This is a new feature we are currently working on 0.17 relea

se so the feature is not yet released. We have added a custom log appender for Airavata and its not registered by default. If you enable it it will push all the Json formatted logs to a Kafka topic. So if you want to enable this

feature you first have to setup a Kafka cluster so that all the logs will be pushed to the Kafka topic created based on the configuration provided. Once you have kafka cluster setup then you can start airavata by simply changing two parameters in airavata-server.properties, but if you want to know

how to setup Kafka cluster please use below related articles for the instructions.

How to enable KafkaAppender 

Overwrite the following property names with your configurations based on your Kafla setup.

# Kafka Logging related configuration
isRunningOnAws= false - Set to true if you are running Airavata on AWS
kafka.broker.list= localhost:9092 - One or more kafka broker node address with the port, Giving one is enough because KafkaProducer will find out the addresses of other nodes
kafka.topic.prefix= staging - Topic prefix you want to use because Airavata will create the topic names for you
enable.kafka.logging= true - Enable kafka Appender to register as a log Appender.

How is a sample Json log message look like ?

{
        "serverId" => {
        "serverId" => "192.168.59.3",
        "hostName" => "192.168.59.3",
         "version" => "airavata-0.16-135-gac0cae6",
           "roles" => [
            [0] "gfac"
        ]
    },
           "message" => "Skipping Zookeeper embedded startup ...",
         "timestamp" => "2016-09-09T20:57:08.329Z",
             "level" => "INFO",
        "loggerName" => "org.apache.airavata.common.utils.AiravataZKUtils",
              "mdc"  => {
      				"gateway_id": "21845d02-7d2c-11e6-ae22-562311499611",
      				"experiment_id": "21845d02-7d2c-11e6-ae22-34b6b6499611",
     				"process_id": "21845d02-7d2c-11e6-ae22-56b6b6499611",
      				"token_id": "21845d02-7d2c-11e6-ae22-56b6b6499611"
    			},
        "threadName" => "main",
          "@version" => "1",
        "@timestamp" => "2016-09-09T20:57:11.678Z",
              "type" => "gfac_logs",
              "tags" => [
        [0] "local",
        [1] "CoreOS-899.13.0"
    ],
    "timestamp_usec" => 0
}

How airavata create kafka topic names by given topic prefix

Airavata has few services and its completely flexible for you to deploy in the way you like, you can deploy all the services (Apache thrift services) in on JVM or you can create one JVM for each component or you can merge only few of them to one JVM. In the above log you can see the roles section contains only gfac which means above log was taken from a gfac server node and no other component was running in that JVM. So topic creation logic is based on the role of the JVM. To keep the deployment clean we recommend to deploy one component per JVM so that its easier to scale

and diagnose the system. So during topic creation we check the number of roles configured in the JVM

if the number of roles is greater than 4  => <kafka_topic_prefix>_all_logs  ex: staging_all_logs (kafka.topic.prefix = staging)

Otherwise we pick the first role             => <kafka_topic_prefix>_<first role>_logs ex: staging_gfac_logs (kafka.topic.prefix = staging)

How to use the log messages from Kafka topics

We have tried using the log messages and push them to elastic search cluster and view them kibana. To learn about how to setup elastic search and logstash please refer Related articles section. We have used below logstash configuration to read airavata log messages and push them 

to elastic search.

input {
  kafka {
    topic_id => "local_all_logs"
    zk_connect => "127.0.0.1:2181"
    auto_offset_reset => "smallest"
    type => "all_logs"
  }
  kafka {
    topic_id => "local_apiserver_logs"
    zk_connect => "127.0.0.1:2181"
    auto_offset_reset => "smallest"
    type => "apiserver_logs"
  }
  kafka {
    topic_id => "local_gfac_logs"
    zk_connect => "127.0.0.1:2181"
    auto_offset_reset => "smallest"
    type => "gfac_logs"
  }
  kafka {
    topic_id => "local_orchestrator_logs"
    zk_connect => "127.0.0.1:2181"
    auto_offset_reset => "smallest"
    type => "orchestrator_logs"
  }
  kafka {
    topic_id => "local_credentialstore_logs"
    zk_connect => "127.0.0.1:2181"
    auto_offset_reset => "smallest"
    type => "credentialstore_logs"
  }
}

filter {
  mutate { add_field => { "[@metadata][level]" => "%{[level]}" } }
  mutate { lowercase => ["[@metadata][level]"] }
  mutate { gsub => ["level", "LOG_", ""] }
  mutate {
    add_tag => ["local", "CoreOS-899.13.0"]
  }
  ruby {
    code => "
    begin
    t = Time.iso8601(event['timestamp'])
    rescue ArgumentError => e
    # drop the event if format is invalid
    event.cancel
    return
    end
    event['timestamp_usec'] = t.usec % 1000
    event['timestamp'] = t.utc.strftime('%FT%T.%LZ')
    "
  }
}

output {
  stdout { codec => rubydebug }
  if [type] == "apiserver_logs" {
      elasticsearch {
        hosts => ["elasticsearch.us-east-1.aws.found.io:9200"]
 	user => "admin"
	password => "adminpassword"
        index => "local-apiserver-logs-logstash-%{+YYYY.MM.dd}"
    }
  } else if [type] == "gfac_logs" {
      elasticsearch {
        hosts => ["elasticsearch.us-east-1.aws.found.io:9200"]
	user => "admin"
	password => "adminpassword"
        index => "local-gfac-logs-logstash-%{+YYYY.MM.dd}"
    }
  } else if [type] == "orchestrator_logs" {
      elasticsearch {
        hosts => ["elasticsearch.us-east-1.aws.found.io:9200"]
	user => "admin"
	password => "adminpassword"
        index => "local-orchestrator-logs-logstash-%{+YYYY.MM.dd}"
    }
  } else if [type] == "credentialstore_logs" {
      elasticsearch {
        hosts => ["elasticsearch.us-east-1.aws.found.io:9200"]
	user => "admin"
	password => "adminpassword"
        index => "local-credentialstore-logs-logstash-%{+YYYY.MM.dd}"
    }
  } else {
  elasticsearch {
    hosts => ["elasticsearch.us-east-1.aws.found.io:9200"]
    user => "admin"
    password => "adminpassword"
    index => "local-airavata-logs-logstash-%{+YYYY.MM.dd}"
  }
}
}

What are the options of setting up Elastic search and Kibana

Easiest and fastest way to use Elastic search is using the hosted version of Elastic search from a cloud provider, there are set of companies who provide elastic search as a service so you can setup a Elastic search cluster with few clicks. But most of these services charge you more money based on the load you have. If you have a very low load and you require relatively low TTL for your logs it might be efficient and financially make sense to use a ES cluster from of the providers. If you have relatively high TTL for your logs then setup your own cluster is also an option. To start setup your own elastic search cluster and Kiban follow the very last few links below. If you want to secure Kibana you can use another product from Elastic search called Shield and add security to your ES cluster and Kibana.

http://kafka.apache.org/documentation.html

https://www.elastic.co/guide/en/logstash/current/getting-started-with-logstash.html

https://www.elastic.co/cloud/as-a-service/signup

https://www.digitalocean.com/community/tutorials/how-to-set-up-a-production-elasticsearch-cluster-on-ubuntu-14-04

https://www.elastic.co/guide/en/kibana/current/production.html

https://www.elastic.co/guide/en/shield/shield-1.0/marvel.html