Table of Contents |
---|
The Setup
...
When adding you add a net new data source to Metron, the first step is to decide how to push the events from the new telemetry data source into Metron. You can use a number of data collection tools and that decision is decoupled from Metron. An excellent tool for pushing data into Metron is Apache Nifi Apache NiFi which this section will describe how to use. The second step is to configure Metron to parse the telemetry data source so that downstream processing can be done on it. In this article we will walk you through how to perform both of these steps.
In the section previous section, we described the following set of requirements for Customer Foo who wanted to add the Squid telemetry data source Into into Metron.
- The proxy events from the Squid logs need to be ingested in real-time.
- The proxy logs must be parsed into a standardized JSON structure that Metron can understand.
- In real-time, the squid proxy event must be enriched so that the domain names are enriched with the IP information.
- In real-time, the IP within the proxy event must be checked for threat intel feeds.
- If there is a threat intel hit, an alert needs to must be raised.
- The end user must be able to see the new telemetry events and the alerts from the new data source.
- All of these requirements will need to must be implemented easily without writing any new Java code.
...
- KAFKA_HOST = host where a Kafka broker is installed
- ZOOKEEPER_HOST = host where a Zookeeper server is installed
- PROBE_HOST = Host where your sensor, probes are installed. If don't have any sensors installed, pick the host where a storm supervisor is running
- SQUID_HOST = Host where you want to install SQUID. If you don't care, just install on the PROBE_HOST
- NIFI_HOST = The host where you will install NIFI. You want this this to be same host that you installed Squid.
- HOST_WITH_ENRICHMENT_TAG = This is the host in your inventory hosts file that you put under the group "enrichment"
- SEARCH_HOST = This is the host where you have elastic or solr running. This is the host in your inventory hosts file that you put under the group "search". Pick one of the search hosts
- SEARCH_HOST_PORT = The port of the search host where indexing is configured. (e.g: 9300)
- METRON_UI_HOST = This is the host where your metron Metron ui web application is running. This is the host in your inventory hosts file that you put under the group "web".
- METRON_VERSION = The release of the metron Metron binaries you are working with (e.g: 0.2.0BETA-RC2)
...
...
- The first thing we need to do is decide if we will be using the Java-based parser or the Grok-based parser for the new telemetry. In this example we will be using the Grok parser. Grok parser is perfect for structured or semi-structured logs that are well understood (check) and telemetries with lower volumes of traffic (check).
- Next we need to define the Grok expression for our log. Refer to Grok documentation for additional details. In our case the pattern is:
SQUID_DELIMITED %{NUMBER:timestamp}%{SPACE:UNWANTED} %{INT:elapsed}%{SPACE:UNWANTED}%{IPV4:ip_src_addr} %{WORD:action}/%{NUMBER:code} %{NUMBER:bytes} %{WORD:method} %{NOTSPACE:url} - %{WORD:UNWANTED}\/%{IPV4:ip_dst_addr} %{WORD:UNWANTED}\/%{WORD:UNWANTED}
Notice that we apply the UNWANTED tag for any part of the message that we don't want included in our resulting JSON structure. Finally, notice that we applied the naming convention to the IPV4 field by referencing the following list of field conventions.
- The last thing we need to do is to validate the Grok pattern to make sure it's valid. For our test we will be using a free Grok validator called Grok Constructor. A validated Grok expression should look like this:
- Now that the Grok pattern has been defined, we need to save it and move it to HDFS.
- ssh into HOST $HOST_WITH_ENRICHMENT_TAG as root
- Create a file called "squid" in the tmp directory and copy the Grok pattern into the file.
- touch /tmp/squid
- Open up the squid file add the grok Grok pattern defined above
- put the squid file into the directory where Metron stores its Grok parsers. Existing Grok parsers that ship with Metron are staged under /apps/metron/pattern
- su - hdfs
- hadoop fs -rmr /apps/metron/patterns/squid
- hdfs dfs -put /tmp/squid /apps/metron/patterns/
...
Using Apache
...
NiFi to Stream data into Metron
Put simply NiFi was built to automate the flow of data between systems. Hence it is a fantastic tool to collect, ingest and push data to Metron. The below instructions on how to install configure and create the nifi NiFi flow to push squid events into Metron.
Install, Configure and and Start Apache
...
NiFi
The following shows how to install Nifi NiFi on the VM. Do the following as root:
- ssh into HOST $NIFI_HOST
- Download NifiNiFi
cd /usr/lib wget http://public-repo-1.hortonworks.com/HDF/centos6/1.x/updates/1.2.0.0/HDF-1.2.0.0-91.tar.gz tar -zxvf HDF-1.2.0.0-91.tar.gz
- Edit Nifi NiFi Configuration to update the port of the nifi NiFi web app: nifi.web.http.port=8089
cd HDF-1.2.0.0/nifi vi conf/nifi.properties //update nifi.web.http.port to 8089
- Install Nifi NiFi as service
bin/nifi.sh install nifi
- Start the Nifi NiFi Service
service nifi start
- Go to the Nifi NiFi Web: http://$NIFI_HOST:8089/nifi/
Create a
...
NiFi Flow to stream events to Metron
Now we will create a flow to capture events from squid and push them into metron
...