Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This should ingest our Squid logs into Kafka.  Now we are ready to tackle the Metron parsing topology setup.  The first thing we need to do is decide if we will be using the Java-based parser of a Grok-based parser for the new telemetry.  In this example we will be using the Grok parser.  Grok parser is perfect for structured or semi-structured logs that are well understood (check) and telemetries with lower volumes of traffic (check).  The first thing we need to do is define the Grok expression for our log.  Refer to Grok documentation for additional details.  In our case the pattern is:

WDOM [^(?:http:\/\/|www\.|https:\/\/)]([^\/]+)

SQUID_DELIMITED %{NUMBER:timestamp} %{SPACE:UNWANTED} % %{INT:elapsed} %{IPV4:ip_src_addr} %{WORD:action}/%{NUMBER:code} %{NUMBER:bytes} %{WORD:method} http:\/\/\www. %{WDOMNOTSPACE:url} \/ - %{WORD:UNWANTED}\/%{IPV4:ip_dst_addr} %{WORD:UNWANTED}\/%{WORD:UNWANTED}

Notice that I define a WDOM pattern (that is more tailored to Squid instead of using the generic Grok URL pattern) before defining the Squid log pattern.  This is optional and is done for ease of use.  Also, notice that I apply the UNWANTED tag for any part of the message that I don't want included in my resulting JSON structure.  Finally, notice that I applied the naming convention to the IPV4 field by referencing the following list of field conventions.  The last thing I need to do is to validate my Grok pattern to make sure it's valid. For our test we will be using a free Grok validator called Grok Constructor.  A validated Grok expression should look like this:

...

{
  "parserClassName""org.apache.metron.parsers.GrokParser",
  "sensorTopic""squid",
  "parserConfig": {
    "grokPath""/apps/metron/patterns/squid",
    "patternLabel""SQUID_DELIMITED",
    "timestampField": "timestamp"
  },

  "fieldTransformations" : [

     {

     "transformation" : "MTL"
    ,"output" : [ "full_hostname", "domain_without_subdomains" ]
    ,"config" : {
                    "full_hostname" : "URL_TO_HOST(url)"
                   ,"domain_without_subdomains" : "DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
                   }
     }
]


}

 

Notice the use of the fieldTransformations in the parser configuration.  Our Grok Parser is set up to extract the URL, but really we want just the domain or even the domain without subdomains.  To do this, we can use the Metron Transformation Language field transformation.  The Metron Transformation Language is a Domain Specific Language which allows users to define extra transformations to be done on the messages flowing through the topology.  It supports a wide range of common network and string related functions as well as function composition and list operations.  In our case, we extract the hostname from the URL via the URL_TO_HOST function and remove the domain names with DOMAIN_REMOVE_SUBDOMAINS thereby creating two new fields, "full_hostname" and "domain_without_subdomains" to each message.

A script is provided to upload configurations to Zookeeper.  Upload the new parser config to Zookeeper:

...

There you will see parsed message + performance timestamps.  We will discuss the performance timestamps in another blog entry.  

Image AddedImage Removed

By convention the index where the new messages will be indexed is called squid_index_[timestamp] and the document type is squid_doc.

...