Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Log into KAFKA_HOST as root
  2. Create Kafka topic called squid:
    1. /use/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER_HOST:2181 --create --topic squid --partitions 1 --replication-factor 1
  3. List all of the Kafka topics to ensure that the new topic exists:
    1. /use/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER_HOST:2181 --list
  4. You should see the following list of Kafka topics:
    • bro
      enrichment
      pcap
      snort
      squid
      yaf

Step 3: Create a Grok Statement to Parse the Squid Telemetry Event

...

  1. The first thing we need to do is decide if we will be using the Java-based parser or the Grok-based parser for the new telemetry. In this example we will be using the Grok parser. Grok parser is perfect for structured or semi-structured logs that are well understood (check) and telemetries with lower volumes of traffic (check).
  2. Next we need to define the Grok expression for our log. Refer to Grok documentation for additional details. In our case the pattern is:

     

    SQUID_DELIMITED %{NUMBER:timestamp}%{SPACE:UNWANTED}  %%{INT:elapsed}%{SPACE:UNWANTED}%{IPV4:ip_src_addr} %{WORD:action}/%{NUMBER:code} %{NUMBER:bytes} %{WORD:method} %{NOTSPACE:url} - %{WORD:UNWANTED}\/%{IPV4:ip_dst_addr} %{WORD:UNWANTED}\/%{WORD:UNWANTED}

     

  3. Notice that we apply the UNWANTED tag for any part of the message that we don't want included in our resulting JSON structure. Finally, notice that we applied the naming convention to the IPV4 field by referencing the following list of field conventions.

  4. The last thing we need to do is to validate the Grok pattern to make sure it's valid. For our test we will be using a free Grok validator called Grok Constructor. A validated Grok expression should look like this:

  5. Now that the Grok pattern has been defined, we need to save it and move it to HDFS. 
    1. ssh into HOST $HOST_WITH_ENRICHMENT_TAG as root
    2. Create a file called "squid" in the tmp directory and copy the Grok pattern into the file.
      1. touch /tmp/squid
      2. Open up the squid file add the grok pattern defined above
    3. put the squid file into the directory where Metron stores its Grok parsers. Existing Grok parsers that ship with Metron are staged under /apps/metron/pattern
      1. su - hdfs
      2. hadoop fs -rmr /apps/metron/patterns/squid
      3. hdfs dfs -put /tmp/squid /apps/metron/patterns/

Step 4:

...

Parse and Transform the Squid Message

Now that the Grok pattern is staged in HDFS we need to define a parser configuration for the Metron Parsing Topology.  The configurations are kept in Zookeeper so the sensor configuration must be uploaded there after it has been created.

  1. ssh into Host $HOST_WITH_ENRICHMENT_TAG as root
  2. Create a Squid Grok parser configuration file at /usr/metron/$METRON_RELEASE/config/zookeeper/parsers/squid.json with the following contents: 

    {
     
     "parserClassName":
     
     "org.apache.metron.parsers.GrokParser",
     
     "sensorTopic":
     
     "squid",
     
     "parserConfig": {
       
     "grokPath":
     
     "/apps/metron/patterns/squid",
       
     "patternLabel":
     
     "SQUID_DELIMITED",
       
     "timestampField": "timestamp"
     
     },
       
     "fieldTransformations" : [
     

         {

          
       {
    "transformation" : "MTL"
       
        ,"output" : [ "full_hostname", "domain_without_subdomains" ]
       
        ,"config" :
    {
                       
     {
    "full_hostname" : "URL_TO_HOST(url)"
                       
                     ,"domain_without_subdomains" : "DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
                       }
         }
    ]
                    }
    }
    ]
     }

     

     

    }

  3. Notice the use of the fieldTransformations in the parser configuration.  Our Grok Parser is set up to extract the URL, but really we want just the domain or even the domain without subdomains.  To do this, we can use the Metron Transformation Language field transformation.  The Metron Transformation Language is a Domain Specific Language which allows users to define extra transformations to be done on the messages flowing through the topology.  It supports a wide range of common network and string related functions as well as function composition and list operations.  In our case, we extract the hostname from the URL via the URL_TO_HOST function and remove the domain names with DOMAIN_REMOVE_SUBDOMAINS thereby creating two new fields, "full_hostname" and "domain_without_subdomains" to each message.
  4. All parser configurations are stored in Zookeeper. A script is provided to upload configurations to Zookeeper. 
    1. /usr/metron/$METRON_RELEASEVERSION/bin/zk_load_configs.sh --mode PUSH -i /usr/metron/METRON$METRON_RELEASEVERSION/config/zookeeper zookeeper -z $ZOOKEEPER_HOST:2181 

Step 6: Deploy the new Parser Topology

...