In this blog post we will walk through what it takes to setup a new telemetry source in Metron. For this example we will setup a new sensor, capture the sensor logs, pipe the logs to Kafka, pick up the logs with a Metron parsing topology, parse them, and run them through the Metron stream processing pipeline.
Our example sensor will be a Squid Proxy. Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. Squid logs are simple to explain and easy to parse and the velocity of traffic coming from Squid is representative of a a typical network-based sensor. Hence, we feel it's a good telemetry to use for this tutorial.
Step 1: Acquire Metron Code and Development Environment
There are two ways to acquire Metron code for this code lab. One is to download it from the USB stick administered for this exercise. Two it would automatically be imported by running the code lab platform vagrant scripts
cd /metron-deployment/vagrant/codelab-platform
./run.sh
By running the following script if you have the local copy of the code lab image from the USB stick it will use the USB version, but otherwise will get the image from Vagrant Atlas. Beware the image is large so it will take a little while to download it.
Step 2: Build the Metron code (Optional)
If you are not running Metron from the USB stick you need to download and build the code. Please see here for full Metron installation and validation instructions. Verify that the project has been built before creating the VM. First lets get Metron from Apache.
git clone https://git-wip-us.apache.org/repos/asf/incubator-metron.git
git tag -l
Now you will see a list of Metron releases. You will see major releases, minor releases, and release candidaes. Refer to the Metron website with regards to which is the current stable release recommended for downloading. Once you select the Metron release run the following command to download it:
cd incubator-metron
git checkout tags/[MetronReleaseVersion]
Now that we have downloaded Metron we need to build it. For the purposes of this exercise we will build without running through Metron's unit and integration test suites. To do so run the following command:
mvn clean package -DskipTests
Now we have downloaded and built metron it's on to the next step. Next we need to make a decision about the Metron environment and which parts of Metron we would like to build. If you are running from the USB stick the code is already pre-built.
Step 3 : Installing a sample sensor
Log into the sensors node and install the squid sensor. If you are on the QuickDev platform your VM will be called node1. If you are on AWS environment your sensor node will be tagged with the [sensors] tag. You can look through the AWS console to find which node in your cluster has this tag.
cd metron-deployment/vagrant/codelab-platform/
vagrant ssh
Once you log into the sensor node you can install the Squid sensor.
sudo yum install squid
sudo service squid start
This will run through the install and the Squid sensor will be installed and started. Now lets look at Squid logs.
sudo su -
cd /var/log/squid
ls
You see that there are three types of logs available:
- access.log
- cache.log
- squid.out
We are interested in access.log as that is the log that records the proxy usage. We see that initially the log is empty. Lets generate a few entries for the log.
squidclient "http://www.aliexpress.com/af/shoes.html?ltype=wholesale&d=y&origin=n&isViewCP=y&catId=0&initiative_id=SB_20160622082445&SearchText=shoes"
squidclient "http://www.help.1and1.co.uk/domains-c40986/transfer-domains-c79878"
squidclient "http://www.pravda.ru/science/"
squidclient "https://www.google.com/maps/place/Waterford,+WI/@42.7639877,-88.2867248,12z/data=!4m5!3m4!1s0x88059e67de9a3861:0x2d24f51aad34c80b!8m2!3d42.7630722!4d-88.2142563"
squidclient "http://www.brightsideofthesun.com/2016/6/25/12027078/anatomy-of-a-deal-phoenix-suns-pick-bender-chriss"
squidclient "https://www.microsoftstore.com/store/msusa/en_US/pdp/Microsoft-Band-2-Charging-Stand/productID.329506400"
squidclient "http://www.autonews.com/article/20151115/RETAIL04/311169971/toyota-fj-cruiser-is-scarce-hot-and-high-priced"
squidclient "https://tfl.gov.uk/plan-a-journey/"
squidclient "https://www.facebook.com/Africa-Bike-Week-1550200608567001/"
squidclient "http://www.ebay.com/itm/02-Infiniti-QX4-Rear-spoiler-Air-deflector-Nissan-Pathfinder-/172240020293?fits=Make%3AInfiniti%7CModel%3AQX4&hash=item281a4e2345:g:iMkAAOSwoBtW4Iwx&vxp=mtr"
squidclient "http://www.recruit.jp/corporate/english/company/index.html"
squidclient "http://www.lada.ru/en/cars/4x4/3dv/about.html"
squidclient "http://www.help.1and1.co.uk/domains-c40986/transfer-domains-c79878"
squidclient "http://www.aliexpress.com/af/shoes.html?ltype=wholesale&d=y&origin=n&isViewCP=y&catId=0&initiative_id=SB_20160622082445&SearchText=shoes"
In production environments you would configure your users web browsers to point to the proxy server, but for the sake of simplicity of this tutorial we will use the client that is packaged with the Squid installation After we use the client to simulate proxy requests the Squid log entries would look as follows:
1467011157.401 415 127.0.0.1 TCP_MISS/200 337891 GET http://www.aliexpress.com/af/shoes.html? - DIRECT/207.109.73.154 text/html
1467011158.083 671 127.0.0.1 TCP_MISS/200 41846 GET http://www.help.1and1.co.uk/domains-c40986/transfer-domains-c79878 - DIRECT/212.227.34.3 text/html
1467011159.978 1893 127.0.0.1 TCP_MISS/200 153925 GET http://www.pravda.ru/science/ - DIRECT/185.103.135.90 text/html
1467011160.044 58 127.0.0.1 TCP_MISS/302 1471 GET https://www.google.com/maps/place/Waterford,+WI/@42.7639877,-88.2867248,12z/data=cdcd/var/log/squidm5squidclienthttp://www.aliexpress.com/af/shoes.html? - DIRECT/172.217.3.164 text/html
1467011160.145 155 127.0.0.1 TCP_MISS/200 133234 GET http://www.brightsideofthesun.com/2016/6/25/12027078/anatomy-of-a-deal-phoenix-suns-pick-bender-chriss - DIRECT/151.101.41.52 text/html
1467011161.224 1073 127.0.0.1 TCP_MISS/200 141323 GET https://www.microsoftstore.com/store/msusa/en_US/pdp/Microsoft-Band-2-Charging-Stand/productID.329506400 - DIRECT/2.19.142.162 text/html
1467011161.491 262 127.0.0.1 TCP_MISS/302 1955 GET http://www.autonews.com/article/20151115/RETAIL04/311169971/toyota-fj-cruiser-is-scarce-hot-and-high-priced - DIRECT/54.88.37.253 text/html
1467011162.627 1133 127.0.0.1 TCP_MISS/200 88544 GET https://tfl.gov.uk/plan-a-journey/ - DIRECT/54.171.145.187 text/html
1467011163.515 879 127.0.0.1 TCP_MISS/200 461930 GET https://www.facebook.com/Africa-Bike-Week-1550200608567001/ - DIRECT/69.171.230.68 text/html
1467011164.286 749 127.0.0.1 TCP_MISS/200 190407 GET http://www.ebay.com/itm/02-Infiniti-QX4-Rear-spoiler-Air-deflector-Nissan-Pathfinder-/172240020293? - DIRECT/23.74.62.44 text/html
1467011164.447 128 127.0.0.1 TCP_MISS/404 12920 GET http://www.recruit.jp/corporate/english/company/index.html - DIRECT/23.74.66.205 text/html
1467011166.125 1659 127.0.0.1 TCP_MISS/200 69469 GET http://www.lada.ru/en/cars/4x4/3dv/about.html - DIRECT/195.144.198.77 text/html
1467011166.543 401 127.0.0.1 TCP_MISS/200 41846 GET http://www.help.1and1.co.uk/domains-c40986/transfer-domains-c79878 - DIRECT/212.227.34.3 text/html
1467011168.519 445 127.0.0.1 TCP_MISS/200 336155 GET http://www.aliexpress.com/af/shoes.html? - DIRECT/207.109.73.154 text/html
The format of the log is:
timestamp | time elapsed | remotehost | code/status | bytes | method | URL rfc931 peerstatus/peerhost | type
Now that we have the sensor set up and generating logs we need to figure out how to pipe these logs to a Kafka topic. To do so the first thing we need to do is setup a new Kafka topic for Squid.
Step 4 : Define Environment Variables
If you are using the quick-dev image your links are:
Ambari: http://node1:8080/
Storm: http://node1:8744/index.html
Now lets setup the following environment variables on node1 to make it easier to navigate and carry over the commands from quick-dev to AWS or bare metal deployment.
export ZOOKEEPER=node1:2181
export BROKERLIST=node1:6667
export HDP_HOME="/usr/hdp/current"
export METRON_VERSION="0.4.0"
export METRON_HOME="/usr/metron/${METRON_VERSION}"
Note: You should supply a comma-delimited list of host:port items for the ZOOKEEPER and BROKERLIST variables if you are running in an environment with multiple hosts for Zookeeper and the Kafka brokers.
Step 5 : Create Kafka topics and ingest sample data
${HDP_HOME}/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER --create --topic squid --partitions 1 --replication-factor 1
${HDP_HOME}/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER --list
The following commands will setup a new Kafka topic for squid. Now let's test how we can pipe the Squid logs to Kakfka
cat /var/log/squid/access.log | ${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic squid
${HDP_HOME}/kafka-broker/bin/kafka-console-consumer.sh --zookeeper $ZOOKEEPER --topic squid --from-beginning
Note: The following steps for manually creating the Grok expression, copying the pattern to HDFS, and creating the parser and indexing json configs for the sensor is no longer necessary in full dev. The files are installed by default and you can simply start the squid topology as described below to achieve the end result of these steps.
This should ingest our Squid logs into Kafka. Now we are ready to tackle the Metron parsing topology setup. The first thing we need to do is decide if we will be using the Java-based parser of a Grok-based parser for the new telemetry. In this example we will be using the Grok parser. Grok parser is perfect for structured or semi-structured logs that are well understood (check) and telemetries with lower volumes of traffic (check). The first thing we need to do is define the Grok expression for our log. Refer to Grok documentation for additional details. In our case the pattern is:
SQUID_DELIMITED %{NUMBER:timestamp}[^0-9]*%{INT:elapsed} %{IP:ip_src_addr} %{WORD:action}/%{NUMBER:code} %{NUMBER:bytes} %{WORD:method} %{NOTSPACE:url}[^0-9]*(%{IP:ip_dst_addr})?
Notice that I apply the UNWANTED tag for any part of the message that I don't want included in my resulting JSON structure. Finally, notice that I applied the naming convention to the IPV4 field by referencing the following list of field conventions. The last thing I need to do is to validate my Grok pattern to make sure it's valid. For our test we will be using a free Grok validator called Grok Constructor. A validated Grok expression should look like this:
Now that the Grok pattern has been defined we need to save it and move it to HDFS. Existing Grok parsers that ship with Metron are staged under /apps/metron/patterns/
First we do a directory listing to see which patterns are available with the platform
[root@node1 bin]# hdfs dfs -ls /apps/metron/patterns/
Found 5 items
-rw-r--r-- 3 hdfs hadoop 13427 2016-04-25 07:07 /apps/metron/patterns/asa
-rw-r--r-- 3 hdfs hadoop 5203 2016-04-25 07:07 /apps/metron/patterns/common
-rw-r--r-- 3 hdfs hadoop 524 2016-04-25 07:07 /apps/metron/patterns/fireeye
-rw-r--r-- 3 hdfs hadoop 2552 2016-04-25 07:07 /apps/metron/patterns/sourcefire
-rw-r--r-- 3 hdfs hadoop 879 2016-04-25 07:07 /apps/metron/patterns/yaf
Now we add a new pattern need to move our new Squid pattern into the same directory. Create a file from the grok pattern above:
touch /tmp/squid
vi /tmp/squid
Then move it to HDFS:
su - hdfs
**if the pattern already exists and you need to replace also run hdfs dfs -rm /apps/metron/patterns/squid
hdfs dfs -put /tmp/squid /apps/metron/patterns/
exit
Now that the Grok pattern is staged in HDFS we need to define a parser configuration for the Metron Parsing Topology. The configurations are kept in Zookeeper so the sensor configuration must be uploaded there after it has been created. A Grok parser configuration follows this format:
{
"parserClassName": "org.apache.metron.parsers.GrokParser",
"sensorTopic": "sensor name",
"parserConfig": {
"grokPath": "grok pattern",
"patternLabel": "grok label",
... other optional fields
}
}
Create a Squid Grok parser configuration file at ${METRON_HOME}/config/zookeeper/parsers/squid.json
with the following contents:
{
"parserClassName": "org.apache.metron.parsers.GrokParser",
"sensorTopic": "squid",
"parserConfig": {
"grokPath": "/patterns/squid",
"patternLabel": "SQUID_DELIMITED",
"timestampField": "timestamp"
},"fieldTransformations" : [
{
"transformation" : "STELLAR"
,"output" : [ "full_hostname", "domain_without_subdomains" ]
,"config" : {
"full_hostname" : "URL_TO_HOST(url)"
,"domain_without_subdomains" : "DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
}
}
]}
Notice the use of the fieldTransformations in the parser configuration. Our Grok Parser is set up to extract the URL, but really we want just the domain or even the domain without subdomains. To do this, we can use the Metron Transformation Language field transformation. The Metron Transformation Language is a Domain Specific Language which allows users to define extra transformations to be done on the messages flowing through the topology. It supports a wide range of common network and string related functions as well as function composition and list operations. In our case, we extract the hostname from the URL via the URL_TO_HOST function and remove the domain names with DOMAIN_REMOVE_SUBDOMAINS thereby creating two new fields, "full_hostname" and "domain_without_subdomains" to each message.
We can also setup index types and batch sizing. Add the following lines to a file named ${METRON_HOME}/config/zookeeper/indexing/squid.json
{
"hdfs" : {
"index": "squid",
"batchSize": 5,
"enabled" : true
},
"elasticsearch" : {
"index": "squid",
"batchSize": 5,
"enabled" : true
},
"solr" : {
"index": "squid",
"batchSize": 5,
"enabled" : true
}
}
Another thing we can do is validate our messages. Lets say we wanted to make sure that source IPs and destination IPs are valid. The validators are global so we set them up on the global JSON and push them into Zookeeper. To do so perform the following commands:
vi ${METRON_HOME}/config/zookeeper/global.json
and set the json to look as follows:
{
"es.clustername": "metron",
"es.ip": "node1:9300",
"es.date.format": "yyyy.MM.dd.HH","parser.error.topic": "indexing",
"fieldValidations" : [
{
"input" : [ "ip_src_addr", "ip_dst_addr" ],
"validation" : "IP",
"config" : {
"type" : "IPV4"
}
}
]}
A script is provided to upload configurations to Zookeeper. Upload the configs with the PUSH option.
${METRON_HOME}/bin/zk_load_configs.sh -i ${METRON_HOME}/config/zookeeper -m PUSH -z $ZOOKEEPER
And we can verify our configurations have been uploaded by using the DUMP command.
${METRON_HOME}/bin/zk_load_configs.sh -m DUMP -z $ZOOKEEPER
Now, install an Elasticsearch template for your new sensor so that we can effectively query results in the Metron Alerts UI.
Note: This is a new step that is necessary as of the meta alerts feature and Elasticsearch 5.6.2 upgrade.
Run the following commands from the CLI.
curl -XPUT 'http://node1:9200/_template/squid_index' -d '{"template": "squid_index*","mappings": {"squid_doc": {"dynamic_templates": [{"geo_location_point": {"match": "enrichments:geo:*:location_point","match_mapping_type": "*","mapping": {"type": "geo_point"}}},{"geo_country": {"match": "enrichments:geo:*:country","match_mapping_type": "*","mapping": {"type": "keyword"}}},{"geo_city": {"match": "enrichments:geo:*:city","match_mapping_type": "*","mapping": {"type": "keyword"}}},{"geo_location_id": {"match": "enrichments:geo:*:locID","match_mapping_type": "*","mapping": {"type": "keyword"}}},{"geo_dma_code": {"match": "enrichments:geo:*:dmaCode","match_mapping_type": "*","mapping": {"type": "keyword"}}},{"geo_postal_code": {"match": "enrichments:geo:*:postalCode","match_mapping_type": "*","mapping": {"type": "keyword"}}},{"geo_latitude": {"match": "enrichments:geo:*:latitude","match_mapping_type": "*","mapping": {"type": "float"}}},{"geo_longitude": {"match": "enrichments:geo:*:longitude","match_mapping_type": "*","mapping": {"type": "float"}}},{"timestamps": {"match": "*:ts","match_mapping_type": "*","mapping": {"type": "date","format": "epoch_millis"}}},{"threat_triage_score": {"mapping": {"type": "float"},"match": "threat:triage:*score","match_mapping_type": "*"}},{"threat_triage_reason": {"mapping": {"type": "text","fielddata": "true"},"match": "threat:triage:rules:*:reason","match_mapping_type": "*"}},{"threat_triage_name": {"mapping": {"type": "text","fielddata": "true"},"match": "threat:triage:rules:*:name","match_mapping_type": "*"}}],"properties": {"timestamp": {"type": "date","format": "epoch_millis"},"source:type": {"type": "keyword"},"ip_dst_addr": {"type": "ip"},"ip_dst_port": {"type": "integer"},"ip_src_addr": {"type": "ip"},"ip_src_port": {"type": "integer"},"alert": {"type": "nested"},"guid": {"type": "keyword"}}}}}'# Verify the template installs as expectedcurl -XGET 'http://node1:9200/_template/squid_index?pretty'
This template accomplishes two things:
- Sets up default mappings for metron-specific types, e.g. timestamps.
- Sets up types for properties that will come from the parsed data, e.g. ip_src_addr.
Now start the new squid parser topology:
${METRON_HOME}/bin/start_parser_topology.sh -k $BROKERLIST -z $ZOOKEEPER -s squid
Navigate to the squid parser topology in the Storm UI at http://node1:8744/index.html and verify the topology is up with no errors:
Now that we have a new running squid parser topology, generate some data to parse by running this command several times:
sudo tail /var/log/squid/access.log | ${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic squid
Refresh the Storm UI and it should report data being parsed:
Then navigate Elasticsearch at http://node1:9200/_cat/indices?v and verify that a squid index has been created:
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open yaf_index_2016.04.25.15 5 1 5485 0 4mb 4mb
yellow open snort_index_2016.04.26.12 5 1 24452 0 14.4mb 14.4mb
yellow open bro_index_2016.04.25.16 5 1 1295 0 1.9mb 1.9mb
yellow open squid_index_2016.04.26.13 5 1 1 0 7.3kb 7.3kb
yellow open yaf_index_2016.04.25.17 5 1 30750 0 17.4mb 17.4mb
In order to verify that the messages were indexed correctly first install elastic search Head plugin:
cd /usr/share/elasticsearch/bin
sudo ./plugin install mobz/elasticsearch-head
At times elastic search Head plugin install fails with the following error :-
Failed: SSLException[java.security.ProviderException: java.security.KeyException]; nested: ProviderException[java.security.KeyException]; nested: KeyException;
And it can be fixed by upgrading the nss package using the following command :-
sudo yum -y upgrade nss
And navigate to http://node1:9200/_plugin/head/
There you will see parsed message + performance timestamps. We will discuss the performance timestamps in another blog entry.
Now lets see how we create a Kibana dashboard to visualize data in metron. First click on Visualize, select a squid index, and add the fields you wan to display
Then click on save to save the query and import it into the main Metron dashboard: