Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. ssh into Host $HOST_WITH_ENRICHMENT_TAG as root.
  2. Create a Squid Grok parser configuration file at /usr/metron/$METRON_VERSION/config/zookeeper/parsers/squid.json:
     
    touch /usr/metron/$METRON_VERSION/config/zookeeper/parsers/squid.json

  3.  Add the following contents: 
    {
    "parserClassName": "org.apache.metron.parsers.GrokParser",
    "sensorTopic": "squid",
    "parserConfig": {
    "grokPath": "/apps/metron/patterns/squid",
    "patternLabel": "SQUID_DELIMITED",
    "timestampField": "timestamp"
    },
    "fieldTransformations" : [
    {
    "transformation" : "MTLSTELLAR"
    ,"output" : [ "full_hostname", "domain_without_subdomains" ]
    ,"config" : {
    "full_hostname" : "URL_TO_HOST(url)"
    ,"domain_without_subdomains" : "DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
    }
    }
    ]
     }


    Notice the use of the fieldTransformations in the parser configuration.  Our Grok Parser is set up to extract the URL, but really we want just the domain or even the domain without subdomains.  To do this, we can use the Metron Transformation Language field transformation.  The Metron Transformation Language is a Domain Specific Language that allows users to define extra transformations to be done on the messages flowing through the topology.  It supports a wide range of common network and string-related functions as well as function composition and list operations.  In our case, we extract the hostname from the URL via the URL_TO_HOST function and remove the domain names with DOMAIN_REMOVE_SUBDOMAINS thereby creating two new fields, "full_hostname" and "domain_without_subdomains" to each message. Image Modified
    4.
  4. All

    parser

    configurations

    are

    stored

    in

    Zookeeper. Use

    the

    following

    script

    to

    upload

    configurations

    to

    Zookeeper:

       /usr/metron/$METRON_VERSION/bin/zk_load_configs.sh --mode PUSH -i /usr/metron/$METRON_VERSION/config/zookeeper -z $ZOOKEEPER_HOST:2181 

    Note: You might receive the following warning messages when you execute the previous command. You can safely ignore these warning messages.

    log4j:WARN No appenders could be found for logger (org.apache.curator.framework.imps.CuratorFrameworkImpl).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

...

  1. .
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Step 5: Configure Indexing

Next you might want to configure your sensor's indexing. The indexing topology takes the data from a topology that has been enriched and stores the data in one or more supported indices. 

You can choose not to configure a sensor's indexing and use the default values. If you leave the writer configuration unspecified, you will see a warning similar to the following in the Storm console: WARNING: Default and (likely) unoptimized writer config used for hdfs writer and sensor squid. You can ignore this warning message if you intend to use the default configuration.

To configure a sensor's indexing:

     1. Create a file called squid.json at /usr/metron/$METRON_VERSION/config/zookeeper/indexing/:

          touch $METRON_HOME/config/zookeeper/indexing/squid.json

     2. Populate it with the following:

  {
  "elasticsearch"{  
"index""squid",  
"batchSize"5,
  "enabled" : true
},
"hdfs":{
"index": "squid",  
  "batchSize"5,
"enabled" : true 
}

     }

        This file sets the batch size of 5 and the index name to squid for both the Elasticsearch and HDFS writers.

     3. Push the configuration to ZooKeeper:

         /usr/metron/$METRON_VERSION/bin/zk_load_configs.sh --mode PUSH -i /usr/metron/$METRON_VERSION/config/zookeeper -z $ZOOKEEPER_HOST:2181

Step 6: Validate the Squid Message

 

Another thing we can do is validate our messages. Let's say we wanted to make sure that source IPs and destination IPs are valid. The validators are global so we set them up on the global JSON and push them into Zookeeper. The list of available validators can be found here: 

...

More details on the validation framework can be found in the Validation Framework section: https://github.com/apache/incubator-metron/tree/master/metron-platform/metron-common#transformation-language

Step

...

7: Deploy the new Parser Topology

 

Now that we have the Squid parser topology defined, let's deploy it to our cluster.

...

The following steps show how to install NiFi. Perform the following as root:

  1. ssh into HOST $NIFI_HOST as root.
  2. Download NiFi.
    cd /usr/lib
    wget  http://public-repo-1.hortonworks.com/HDF/centos6/1.x/updates/1.2.0.0/HDF-1.2.0.0-91.tar.gz
    tar -zxvf HDF-1.2.0.0-91.tar.gz 
  3. Edit the NiFi configuration to update the port of the NiFi web app: nifi.web.http.port=8089
    cd HDF-1.2.0.0/nifi
    vi  conf/nifi.properties
    //update nifi.web.http.port to 8089
  4. Install NiFi as service.
    bin/nifi.sh install nifi
  5. Start the NiFi Service.
    service nifi start
  6. Go to the NiFi Web: http://$NIFI_HOST:8089/nifi/.
    Note: Be sure to substitute your NiFi host name for $NIFI_HOST in the url above. If you simply click on the host, the url will specify Node1 which will not work.

...

  1. Drag a processor to the canvas (do this by the dragging the processor icon which is the first icon on the toolbar).
  2. Select the TailFile type of processor, then select Add. 
  3. Right click on the processor and select Configure to display the Configure Processor dialog box. In the Settings tab change the name to "Ingest Squid Events"
    1. In the Properties tab, configure the following:
  4. Drag another processor to the canvas.
  5. Select the PutKafka type of processor, then select Add.
  6. Right click on the processor and select Configure. 
  7. In the Settings tab, change the name to "Stream to Metron,” then click the relationship checkboxes for failure and success.
  8. In the Properties tab, set the following three properties:
    1. Known Brokers: $KAFKA_HOST:6667
    2. Topic Name: squid
    3. Client Name: nifi-squid
  9. Create a connection by dragging the arrow from the Ingest Squid Events processor to the Stream to Metron processor.
  10. Press the Shift key and select the entire flow, then click the play button (green arrow). You should see all of the processor icons turn into green arrows like below:
  11. Generate some data using squidclient (do this for about 20+ sites).
    squidclient -h 127.0.0.1 "http://www.cnn.com"
  12. You should see metrics on the processor of data being pushed into Metron.
  13. Look at the Storm UI for the parser topology and you should see tuples coming in.
  14. After about 5 minutes, you should see a new Elastic Search index called squid_index* in the Elastic Admin UI.

...