Export enrichment node to the environment variable
Export zookeeper url
Export kafka broker url
Reference ambari url
0.xBETA
In this blog post we will walk through what it takes to setup a new telemetry source in Metron. For this example we will setup a new sensor, capture the sensor logs, pipe the logs to Kafka, pick up the logs with a Metron parsing topology, parse them, and run them through the Metron stream processing pipeline.
...
After executing the above commands a Metron VM will be built (called node1) and you will be logged in as user vagrant. There will be 4 topologies running but one must be stopped because the VM only has 4 Storm worker slots available.
Leave Leave the enrichment topology running and kill the other parser topologies (bro, snort, or yaf) with either the "storm kill" command or with the Storm UI at http://node1:8744/index.html. Now lets install the Squid sensor.
...
You see that there are three types of logs available: access.log, cache.log, and squid.out. We are interested in access.log as that is the log that records the proxy usage. We see that initially the log is empty. Lets generate a few entries for the log.
RUN more complex data
squidclient -g 20 http://www.aliexpress.com/af/shoes.html?ltype=wholesale&d=y&origin=n&isViewCP=y&catId=0&initiative_id=SB_20160622082445&SearchText=shoes
squidclient -g 20 "http://www.help.1and1.co.uk/domains-c40986/transfer-domains-c79878"
squidclient http://www.cnn.com
squidclient http://www.nba.com
vi /var/log/squid/access.log
...
Now that we have the sensor set up and generating logs we need to figure out how to pipe these logs to a Kafka topic. To do so the first thing we need to do is setup a new Kafka topic for Squid.
Change to environment variable for zookeeper
/usr/hdp/current/kafka-broker/bin//kafka-topics.sh --zookeeper localhost:2181 --create --topic squid --partitions 1 --replication-factor 1
/usr/hdp/current/kafka-broker/bin//kafka-topics.sh --zookeeper localhost:2181 --list
...
SQUID_DELIMITED %{NUMBER:timestamp} %{SPACE:UNWANTED} %{INT:elapsed} %{IPV4:ip_src_addr} %{WORD:action}/%{NUMBER:code} %{NUMBER:bytes} %{WORD:method} %{NOTSPACE:url} - %{WORD:UNWANTED}\/%{IPV4:ip_dst_addr} %{WORD:UNWANTED}\/%{WORD:UNWANTED}
- this is already pre-loaded under /apps/metron/patterns/squid
Notice that I apply the UNWANTED tag for any part of the message that I don't want included in my resulting JSON structure. Finally, notice that I applied the naming convention to the IPV4 field by referencing the following list of field conventions. The last thing I need to do is to validate my Grok pattern to make sure it's valid. For our test we will be using a free Grok validator called Grok Constructor. A validated Grok expression should look like this:
update graphic
Now that the Grok pattern has been defined we need to save it and move it to HDFS. Existing Grok parsers that ship with Metron are staged under /apps/metron/patterns/
don't need to do this step if patterns already pre-loaded
[root@node1 bin]# hdfs dfs -ls /apps/metron/patterns/
Found 5 items
-rw-r--r-- 3 hdfs hadoop 13427 2016-04-25 07:07 /apps/metron/patterns/asa
-rw-r--r-- 3 hdfs hadoop 5203 2016-04-25 07:07 /apps/metron/patterns/common
-rw-r--r-- 3 hdfs hadoop 524 2016-04-25 07:07 /apps/metron/patterns/fireeye
-rw-r--r-- 3 hdfs hadoop 2552 2016-04-25 07:07 /apps/metron/patterns/sourcefire
-rw-r--r-- 3 hdfs hadoop 879 2016-04-25 07:07 /apps/metron/patterns/yaf
...
Create a Squid Grok parser configuration file at /usr/metron/0.1BETA/config/zookeeper/parsers/squid.json with the following contents:
reference stellar docs
relink
{
"parserClassName": "org.apache.metron.parsers.GrokParser",
"sensorTopic": "squid",
"parserConfig": {
"grokPath": "/apps/metron/patterns/squid",
"patternLabel": "SQUID_DELIMITED",
"timestampField": "timestamp"
},"fieldTransformations" : [
{
"transformation" : "MTL"
,"output" : [ "full_hostname", "domain_without_subdomains" ]
,"config" : {
"full_hostname" : "URL_TO_HOST(url)"
,"domain_without_subdomains" : "DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
}
}
]
}
...
Navigate to the squid parser topology in the Storm UI at http://node1:8744/index.html and verify the topology is up with no errors:
Update Graphic
CREATE ES template before deployment
Now that we have a new running squid parser topology, generate some data to parse by running this command several times:
...