Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Log into $HOST_WITH_ENRICHMENT_TAG as root user
  2. Cut and paste the below data into a file called "whois_ref.csv" on your virtual machine. This csv file represents our enrichment source.  


    google.com, "Google Inc.", "US", "Dns Admin",874306800000
    work.net, "", "US", "PERFECT PRIVACY, LLC",788706000000
    capitalone.com, "Capital One Services, Inc.", "US", "Domain Manager",795081600000
    cisco.com, "Cisco Technology Inc.", "US", "Info Sec",547988400000
    cnn.com, "Turner Broadcasting System, Inc.", "US", "Domain Name Manager",748695600000
    news.com, "CBS Interactive Inc.", "US", "Domain Admin",833353200000
    nba.com, "NBA Media Ventures, LLC", "US", "C/O Domain Administrator",786027600000
    espn.com, "ESPN, Inc.", "US", "ESPN, Inc.",781268400000
    pravda.com, "Internet Invest, Ltd. dba Imena.ua", "UA", "Whois privacy protection service",806583600000
    hortonworks.com, "Hortonworks, Inc.", "US", "Domain Administrator",1303427404000
    microsoft.com, "Microsoft Corporation", "US", "Domain Administrator",673156800000
    yahoo.com, "Yahoo! Inc.", "US", "Domain Administrator",790416000000
    rackspace.com, "Rackspace US, Inc.", "US", "Domain Admin",903092400000
    1and1.co.uk, "1 & 1 Internet Ltd","UK", "Domain Admin",943315200000

     

  3. The schema of this enrichment source is domain|owner|registeredCountry|registeredTimestamp. Make sure you don't have an empty newline character as the last line of the CSV file, as that will result in a null pointer exception. 

  4. We will use the whois_ref.csv file in step 5

 

Step 3: Configure an Extractor Config file

  1. Configure an extractor config file that describes the enrichment source.  cut and paste this file into a file called "extractor_config_temp.json":         

    {
    "config" : {
        "columns" : {
            "domain" : 0
            ,"owner" : 1
            ,"home_country" : 2
            ,"registrar": 3
            ,"domain_created_timestamp": 4
        }
        ,"indicator_column" : "domain"
        ,"type" : "whois"
        ,"separator" : ","
      }
      ,"extractor" : "CSV"
    }

  2. Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run.

    1. iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json

       

  3. We will use the extractor_config file in step 4
 

Step

...

4: Configure Element to Enrichment Mapping

We now have to configure what element of a tuple should be enriched with what enrichment type. This configuration will be stored in zookeeper.

  1. Log $HOST_WITH_ENRICHMENT_TAG as root user
  2. Cut and paste the following into file into a file called "enrichment_config_temp.json" (make sure to set ZOOKEEPER_HOST with your specific value)  

    {
         "zkQuorum" : "$ZOOKEEPER_HOST:2181"
        ,"sensorToFieldList" : {
              "squid" : {
                 "type" : "ENRICHMENT"
                ,"fieldToEnrichmentTypes" : {
                     "domain_without_subdomains" : [ "whois" ]
                  }
              }
        }
    }

  3. Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out, ru the following:

    1. iconv -c -f utf-8 -t ascii enrichment_config_temp.json -o enrichment_config.json

  4. We will use the extractor_config file in step 5

Step 45: Run the Enrichment Loader

...