Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

TODO

Export enrichment node to the environment variable 

Export zookeeper url 

Export kafka broker url 

Reference ambari url 

0.xBETA

 

In this blog post we will walk through what it takes to setup a new telemetry source in Metron.  For this example we will setup a new sensor, capture the sensor logs, pipe the logs to Kafka, pick up the logs with a Metron parsing topology, parse them, and run them through the Metron stream processing pipeline.  

...

After executing the above commands a Metron VM will be built (called node1) and you will be logged in as user vagrant.  There will be 4 topologies running but one must be stopped because the VM only has 4 Storm worker slots available.  

 

Leave  Leave the enrichment topology running and kill the other parser topologies (bro, snort, or yaf) with either the "storm kill" command or with the Storm UI at http://node1:8744/index.html.  Now lets install the Squid sensor.  

...

You see that there are three types of logs available: access.log, cache.log, and squid.out.  We are interested in access.log as that is the log that records the proxy usage.  We see that initially the log is empty.  Lets generate a few entries for the log.

TODO

RUN more complex data

squidclient  -g 20 http://www.aliexpress.com/af/shoes.html?ltype=wholesale&d=y&origin=n&isViewCP=y&catId=0&initiative_id=SB_20160622082445&SearchText=shoes

squidclient  -g 20 "http://www.help.1and1.co.uk/domains-c40986/transfer-domains-c79878"

 

 

squidclient http://www.cnn.com

squidclient http://www.nba.com

vi /var/log/squid/access.log

...

Now that we have the sensor set up and generating logs we need to figure out how to pipe these logs to a Kafka topic.  To do so the first thing we need to do is setup a new Kafka topic for Squid.

TODO

Change to environment variable for zookeeper 

/usr/hdp/current/kafka-broker/bin//kafka-topics.sh --zookeeper localhost:2181 --create --topic squid --partitions 1 --replication-factor 1

/usr/hdp/current/kafka-broker/bin//kafka-topics.sh --zookeeper localhost:2181 --list

...

SQUID_DELIMITED %{NUMBER:timestamp} %{SPACE:UNWANTED}  %{INT:elapsed} %{IPV4:ip_src_addr} %{WORD:action}/%{NUMBER:code} %{NUMBER:bytes} %{WORD:method} %{NOTSPACE:url} - %{WORD:UNWANTED}\/%{IPV4:ip_dst_addr} %{WORD:UNWANTED}\/%{WORD:UNWANTED}

  • this is already pre-loaded under /apps/metron/patterns/squid

 

Notice that I apply the UNWANTED tag for any part of the message that I don't want included in my resulting JSON structure.  Finally, notice that I applied the naming convention to the IPV4 field by referencing the following list of field conventions.  The last thing I need to do is to validate my Grok pattern to make sure it's valid. For our test we will be using a free Grok validator called Grok Constructor.  A validated Grok expression should look like this:

TODO 

update graphic

Image Modified

 

Now that the Grok pattern has been defined we need to save it and move it to HDFS.  Existing Grok parsers that ship with Metron are staged under /apps/metron/patterns/

don't need to do this step if patterns already pre-loaded 

[root@node1 bin]# hdfs dfs -ls /apps/metron/patterns/

Found 5 items

-rw-r--r--   3 hdfs hadoop      13427 2016-04-25 07:07 /apps/metron/patterns/asa

-rw-r--r--   3 hdfs hadoop       5203 2016-04-25 07:07 /apps/metron/patterns/common

-rw-r--r--   3 hdfs hadoop        524 2016-04-25 07:07 /apps/metron/patterns/fireeye

-rw-r--r--   3 hdfs hadoop       2552 2016-04-25 07:07 /apps/metron/patterns/sourcefire

-rw-r--r--   3 hdfs hadoop        879 2016-04-25 07:07 /apps/metron/patterns/yaf

...

Create a Squid Grok parser configuration file at /usr/metron/0.1BETA/config/zookeeper/parsers/squid.json with the following contents:

TODO

reference stellar docs

relink

{
  "parserClassName""org.apache.metron.parsers.GrokParser",
  "sensorTopic""squid",
  "parserConfig": {
    "grokPath""/apps/metron/patterns/squid",
    "patternLabel""SQUID_DELIMITED",
    "timestampField": "timestamp"
  },

  "fieldTransformations" : [

     {

     "transformation" : "MTL"
    ,"output" : [ "full_hostname", "domain_without_subdomains" ]
    ,"config" : {
                    "full_hostname" : "URL_TO_HOST(url)"
                   ,"domain_without_subdomains" : "DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
                   }
     }
]


}

...

Navigate to the squid parser topology in the Storm UI at http://node1:8744/index.html and verify the topology is up with no errors:


TODO

Update Graphic 

CREATE ES template before deployment


Now that we have a new running squid parser topology, generate some data to parse by running this command several times:

...