...
Prior to going through this tutorial make sure you have Metron properly installed. Please see here for Metron installation and validation instructions. We will be using a single VM setup for this exercise. To setup the VM do the following steps:
cd metron-deployment/vagrant/singlenode-vagrant vagrant plugin install vagrant-hostmanager vagrant up
vagrant ssh
...
Now that we have the sensor set up and generating logs we need to figure out how to pipe these logs to a Kafka topic. To do so the first thing we need to do is setup a new Kafka topic for Squid.
cd /usr/hdp/2.3.4.0-3485current/kafka-broker/bin/
./kafka-topics.sh --zookeeper localhost:2181 --create --topic squid --partitions 1 --replication-factor 1
./kafka-topics.sh --zookeeper localhost:2181 --list
...
tail /var/log/squid/access.log | /usr/hdp/2.3.4.0-3485current/kafka-broker/bin/kafka-console-producer.sh --broker-list node1:6667 --topic squid
./kafka-console-consumer.sh --zookeeper node1:2181 --topic squid --from-beginning
...
WEBURL (?i)\b((?:https?:(?:/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)/)(?:[^\s()<>{}\[\]]+|\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\))+(?:\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’])|(?:(?<!@)[a-z0-9]+(?:[.\-][a-z0-9]+)*[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)\b/?(?!@)))
SQUIDSQUID_DELIMITED %{NUMBER:start_time} %{SPACE:UNWANTED} %{INT:elapsed} %{IPV4:ip_src_addr} %{WORD:action}/%{NUMBER:code} %{NUMBER:bytes} %{WORD:method} %{WEBURL:url}
...
We need to move our new Squid pattern into the same directory. Create a file from the grok pattern above:
touch /tmp/squid
vi /tmp/squid
Then move it to HDFS:
su - hdfs
hdfs dfs -put /tmp/squid /apps/metron/patterns/
exit
Now that the Grok pattern is staged in HDFS we need to define Storm Flux configuration for the Metron Parsing Topology. The configs are staged under
/usr/metron/0.1BETA/config/topologies/ and each parsing topology has it's own set of configs. Each directory for a topology has a remote.yaml which is designed to be run on AWS and local/test.yaml designed to run locally on a single-node VM. At the moment of publishing this blog entry the following configs are available:
/usr/metron/0.1BETA/config/topologiesflux/yaf/test.yaml
/usr/metron/0.1BETA/config/topologies/yafflux/remote.yaml
/usr/metron/0.1BETA/config/topologiesflux/sourcefire/test.yaml
/usr/metron/0.1BETA/config/topologiesflux/sourcefire/remote.yaml
/usr/metron/0.1BETA/configflux/topologies/asa/test.yaml
/usr/metron/0.1BETA/config/topologiesflux/asa/remote.yaml
/usr/metron/0.1BETA/config/topologiesflux/fireeye/test.yaml
/usr/metron/0.1BETA/config/topologiesflux/fireeye/remote.yaml
/usr/metron/0.1BETA/config/topologiesflux/bro/test.yaml
/usr/metron/0.1BETA/config/topologiesflux/bro/remote.yaml
/usr/metron/0.1BETA/config/topologiesflux/ise/test.yaml
/usr/metron/0.1BETA/config/topologiesflux/ise/remote.yaml
/usr/metron/0.1BETA/configflux/topologies/paloalto/test.yaml
/usr/metron/0.1BETA/configflux/topologies/paloalto/remote.yaml
/usr/metron/0.1BETA/configflux/topologies/lancope/test.yaml
/usr/metron/0.1BETA/config/topologiesflux/lancope/remote.yaml
/usr/metron/0.1BETA/configflux/topologies/pcap/test.yaml
/usr/metron/0.1BETA/config/topologiesflux/pcap/remote.yaml
/usr/metron/0.1BETA/config/topologiesflux/enrichment/test.yaml
/usr/metron/0.1BETA/config/topologiesflux/enrichment/remote.yaml
/usr/metron/0.1BETA/configflux/topologies/snort/test.yaml
/usr/metron/0.1BETA/configflux/topologies/snort/remote.yaml
Since we are going to be running locally on a VM we need to define a test.yaml for Squid. The easiest way to do this is to copy one of the existing Grok-based configs (YAF) and tailor it for Squid.
mkdir /usr/metron/0.1BETA/configflux/topologies/squid
cp /usr/metron/0.1BETA/config/topologiesflux/yaf/testremote.yaml /usr/metron/0.1BETA/config/topologiesflux/squid/testremote.yaml
vi /usr/metron/0.1BETA/config/topologiesflux/squid/testremote.yaml
And edit your config to look like this (changes highlighted in red):
name: "squid-test"
config:
topology.workers: 1
components:
- id: "parser"
className: "org.apache.metron.parsing.parsers.GrokParser"
constructorArgs:
- "../Metron-MessageParsers/src/main/resourcesapps/metron/patterns/squid"
- "SQUID_DELIMITED"
configMethods:
- name: "withMetronHDFSHomewithTimestampField"
args:
- "start_time"
- idname: "writerwithMetronHDFSHome"
args:
- ""
- id: "writer"
className className: "org.apache.metron.writer.KafkaWriter"
constructorArgs:
- "${kafka.broker}"
- id: "zkHosts"
className: "storm.kafka.ZkHosts"
constructorArgs:
- "${kafka.zk}"
- id: "kafkaConfig"
className: "storm.kafka.SpoutConfig"
constructorArgs:
# zookeeper hosts
- ref: "zkHosts"
# topic name
- "${spout.kafka.topic.squid}"
# zk root
- ""
# id
- "${spout.kafka.topic.squid}"
properties:
- name: "ignoreZkOffsets"
value: false
- name: "startOffsetTime"
value: -21
- name: "socketTimeoutMs"
value: 1000000
spouts:
- id: "kafkaSpout"
className: "storm.kafka.KafkaSpout"
constructorArgs:
- ref: "kafkaConfig"
bolts:
- id: "parserBolt"
className: "org.apache.metron.bolt.ParserBolt"
constructorArgs:
- "${kafka.zk}"
- "${spout.kafka.topic.squid}"
- ref: "parser"
- ref: "writer"
streams:
- name: "spout -> bolt"
from: "kafkaSpout"
to: "parserBolt"
grouping:
type: SHUFFLE
...
Start the new squid parser topology:
storm jar /usr/metron/0.1BETA/lib/metron-parsers-0.1BETA.jar org.apache.storm.flux.Flux --filter /usr/metron/0.1BETA/config/elasticsearch.properties --remote /usr/metron/0.1BETA/flux/squid/remote.yaml
Navigate to the squid parser topology in the Storm UI at http://node1:8744/index.html and verify the topology is up with no errors:
Now that we have a new running squid parser topology, generate some data to parse by running this command several times:
tail /var/log/squid/access.log | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list node1:6667 --topic squid
Refresh the Storm UI and it should report data being parsed:
Then navigate Elasticsearch at http://node1:9200/_cat/indices?v and verify that a squid index has been created:
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open yaf_index_2016.04.25.15 5 1 5485 0 4mb 4mb
yellow open snort_index_2016.04.26.12 5 1 24452 0 14.4mb 14.4mb
yellow open bro_index_2016.04.25.16 5 1 1295 0 1.9mb 1.9mb
yellow open squid_index_2016.04.26.13 5 1 1 0 7.3kb 7.3kb
yellow open yaf_index_2016.04.25.17 5 1 30750 0 17.4mb 17.4mb