You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

In the previous section, we walked through how to add a new data source squid to Apache Metron. The inevitable next question is how I can enrich the telemetry events in real-time as it flows through the platform. Enrichment is critical when identifying threats or as we like to call it "finding the needle in the haystack". The customers requirement are the following

  1. The proxy events from Squid logs needs to ingested in real-time.
  2. The proxy logs has to be parsed into a standardized JSON structure that Metron can understand.
  3. In real-time, the squid proxy event needs to be enriched so that the domain named are enriched with the IP information
  4. In real-time, the IP with in the proxy event must be checked against for threat intel feeds.
  5. If there is a threat intel hit, an alert needs to be raised
  6. The end user must be able to see the new telemetry events and the alerts from the new data source.
  7. All of this requirements will need to be implemented easily without writing any new java code.

In this section, we will walk you through how to do 3.

Metron Enrichment Framework Explained

Setup and Pre-requisites

  1. You should have completed the instructions in Adding a new Telemetry Data Source
  2. Make sure the following environment variables are configured based on your environment: 


    KAFKA_HOST = host where a Kafka broker is installed
    ZOOKEEPER_HOST = host where a Zookeeper server is installed
    PROBE_HOST = Host where your sensor, probes are installed. If don't have any sensors installed, pick the host where a storm supervisor is running
    SQUID_HOST = Host where you want to install SQUID. If you don't care, just install on the PROBE_HOST
    NIFI_HOST = The host where you will install NIFI. You want this this to be same host that you installed Squid.
    HOST_WITH_ENRICHMENT_TAG = This is the host in your inventory hosts file that you put under the group "enrichment"
    SEARCH_HOST = This is the host where you have elastic or solr running. This is the host in your inventory hosts file that you put under the group "search". Pick one of the search hosts
    SEARCH_HOST_PORT = The port of the search host where indexing is configured. (e.g: 9300)
    METRON_UI_HOST = This is the host where your metron ui web application is running. This is the host in your inventory hosts file that you put under the group "web".
    METRON_VERSION = The release of the metron binaries you are working with (e.g: 0.2.0BETA-RC2)

     



 

Step 1: Create a Mock Enrichment Source

Whois data is expensive so we will not be providing it. Instead we wrote a basic whois scraper (out of context for this exercise) that produces a CSV format for whois data as follows:

  1. Log into $HOST_WITH_ENRICHMENT_TAG as root user
  2. Cut and paste the below data into a file called "whois_ref.csv" on your virtual machine. This csv file represents our enrichment source.  


    google.com, "Google Inc.", "US", "Dns Admin",874306800000
    work.net, "", "US", "PERFECT PRIVACY, LLC",788706000000
    capitalone.com, "Capital One Services, Inc.", "US", "Domain Manager",795081600000
    cisco.com, "Cisco Technology Inc.", "US", "Info Sec",547988400000
    cnn.com, "Turner Broadcasting System, Inc.", "US", "Domain Name Manager",748695600000
    news.com, "CBS Interactive Inc.", "US", "Domain Admin",833353200000
    nba.com, "NBA Media Ventures, LLC", "US", "C/O Domain Administrator",786027600000
    espn.com, "ESPN, Inc.", "US", "ESPN, Inc.",781268400000
    pravda.com, "Internet Invest, Ltd. dba Imena.ua", "UA", "Whois privacy protection service",806583600000
    hortonworks.com, "Hortonworks, Inc.", "US", "Domain Administrator",1303427404000
    microsoft.com, "Microsoft Corporation", "US", "Domain Administrator",673156800000
    yahoo.com, "Yahoo! Inc.", "US", "Domain Administrator",790416000000
    rackspace.com, "Rackspace US, Inc.", "US", "Domain Admin",903092400000
    1and1.co.uk, "1 & 1 Internet Ltd","UK", "Domain Admin",943315200000

     

  3. The schema of this enrichment source is domain|owner|registeredCountry|registeredTimestamp. Make sure you don't have an empty newline character as the last line of the CSV file, as that will result in a null pointer exception.

  4. Configure an extractor config file that describes the enrichment source.  cut and paste this file into a file called "extractor_config_temp.json"

    {
    "config" : {
        "columns" : {
            "domain" : 0
            ,"owner" : 1
            ,"home_country" : 2
            ,"registrar": 3
            ,"domain_created_timestamp": 4
        }
        ,"indicator_column" : "domain"
        ,"type" : "whois"
        ,"separator" : ","
      }
      ,"extractor" : "CSV"
    }

  5. Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run.

    1. iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json
 

Step 2: Configure Element to Enrichment Mapping

We now have to configure what element of a tuple should be enriched with what enrichment type. This configuration will be stored in zookeeper.

The config looks like the following:

{
  "zkQuorum" : "node1:2181"
 ,"sensorToFieldList" : {
    "squid" : {
           "type" : "ENRICHMENT"
          ,"fieldToEnrichmentTypes" : {
             "url" : [ "whois" ]
                                      }
           }
                        }
}

Cut and paste this file into a file called "enrichment_config_temp.json" on the virtual machine. Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run

iconv -c -f utf-8 -t ascii enrichment_config_temp.json -o enrichment_config.json

Step 3: Run the Enrichment Loader

Now that we have the enrichment source and enrichment config defined, we can now run the loader to move the data from the enrichment source to the Metron enrichment Store and store the enrichment config in zookeeper.

/usr/metron/0.1BETA/bin/flatfile_loader.sh -n enrichment_config.json -i whois_ref.csv -t enrichment -c t -e extractor_config.json

After this your enrichment data will be loaded in Hbase and a Zookeeper mapping will be established. The data will be populated into Hbase table called enrichment. To verify that the logs were properly ingested into Hbase run the following command:

hbase shell
scan 'enrichment'

You should see the table bulk loaded with data from the CSV file. Now check if Zookeeper enrichment tag was properly populated:

/usr/metron/0.1BETA/bin/zk_load_configs.sh -z localhost:2181

Generate some data by using the squid client to execute http requests (do this about 20 times)

squidclient http://www.cnn.com

View the Enrichment Telemetry Events in Metron UI

In order to demonstrate the enrichment capabilities of Metron you need to drop all existing indexes for Squid where the data was ingested prior to enrichments being enabled. To do so go back to the head plugin and deleted the indexes like so:

Make sure you delete all Squid indexes. Re-ingest the data (see previous blog post) and the messages should be automatically enriched.

In the Metron-UI, refresh the dashboard and view the data in the Squid Panel in the dashboard:

Notice the enrichments here (whois.owner, whois.domain_created_timestamp, whois.registrar, whois.home_country)

  • No labels