You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 17 Next »

Now that we have created a new telemetry we can see how we can add new enrichments to that telemetry.  In this exercise we will be looking at adding a whois enrichment to the Squid telemetry we setup in the previous entry.  Whois data is expensive so we will not be providing it.  Instead I wrote a basic whois scraper (out of context for this exercise) that produces a CSV format for whois data as follows:

google.com, "Google Inc.", "US", "Dns Admin",874306800000
work.net, "", "US", "PERFECT PRIVACY, LLC",788706000000
capitalone.com, "Capital One Services, Inc.", "US", "Domain Manager",795081600000
cisco.com, "Cisco Technology Inc.", "US", "Info Sec",547988400000
cnn.com, "Turner Broadcasting System, Inc.", "US", "Domain Name Manager",748695600000
news.com, "CBS Interactive Inc.", "US", "Domain Admin",833353200000
nba.com, "NBA Media Ventures, LLC", "US", "C/O Domain Administrator",786027600000
espn.com, "ESPN, Inc.", "US", "ESPN, Inc.",781268400000
pravda.com, "Internet Invest, Ltd. dba Imena.ua", "UA", "Whois privacy protection service",806583600000
hortonworks.com, "Hortonworks, Inc.", "US", "Domain Administrator",1303427404000
microsoft.com, "Microsoft Corporation", "US", "Domain Administrator",673156800000
yahoo.com, "Yahoo! Inc.", "US", "Domain Administrator",790416000000
rackspace.com, "Rackspace US, Inc.", "US", "Domain Admin",903092400000
1and1.co.uk, "1 & 1 Internet Ltd","UK", "Domain Admin",943315200000

Please cut and paste this data into a file called "whois_ref.csv" on your virtual machine.

The schema of this enrichment is domain|owner|registeredCountry|registeredTimestamp.  Make sure you don't have an empty newline character as the last line of the CSV file, as that will result in a pull pointer exception. The first thing we need to do is setup the enrichment source.  In order to do this we first need to setup the extractor config as so:

{
  "config" : {
    "columns" : {
        "domain" : 0
        ,"owner" : 1
        ,"home_country" : 2
        ,"registrar": 3
        ,"domain_created_timestamp": 4
    }
    ,"indicator_column" : "domain"
    ,"type" : "whois"
    ,"separator" : ","
  }
  ,"extractor" : "CSV"
}

Please cut and paste this file into a file called "extractor_config_temp.json" on the virtual machine.  Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run 

iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json

 

And another config to load the zookeeper enrichment config:

{
"zkQuorum" : "$ZOOKEEPER_HOME:2181"
,"sensorToFieldList" : {
"squid" : {
"type" : "ENRICHMENT"
,"fieldToEnrichmentTypes" : {
"domain_without_subdomains" : [ "whois" ]
}
}
}
}

Please cut and paste this file into a file called "enrichment_config_temp.json" on the virtual machine.  Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run 

iconv -c -f utf-8 -t ascii enrichment_config_temp.json -o enrichment_config.json

Which means that the system will map the whois enrichment to the field URL.  Then execute the following command:

/usr/metron/0.1BETA/bin/flatfile_loader.sh -n enrichment_config.json -i whois_ref.csv -t enrichment -c t -e extractor_config.json

After this your enrichment data will be loaded in Hbase and a Zookeeper mapping will be established.  The data will be populated into Hbase table called enrichment.  To verify that the logs were properly ingested into Hbase run the following command

hbase shell

scan 'enrichment'

You should see the table bulk loaded with data from the CSV file.  Now check if Zookeeper enrichment tag was properly populated:

/usr/metron/0.1BETA/bin/zk_load_configs.sh -m DUMP -z localhost:2181

This spits out all of the configs to standard out, you should find one named "squid."

In order to demonstrate the enrichment capabilities of Metron you need to drop all existing indexes for Squid where the data was ingested prior to enrichments being enabled.  To do so go back to the head plugin and deleted the indexes like so:

TODO

No need to drop index 


Make sure you delete all Squid indexes.  Re-ingest the data (see previous blog post) and the messages should be automatically enriched.  The new message should look as follows:



Notice the enrichments here (whois.owner, whois.domain_created_timestamp, whois.registrar, whois.home_country) 



  • No labels