You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

Apache NiFi provides users the ability to build very large and complex DataFlows using NiFi. This is achieved by using the basic components: Processor, Funnel, Input/Output Port, Process Group, and Remote Process Group. These can be thought of as the most basic building blocks for constructing a DataFlow. At times, though, using these small building blocks can become tedious if the same logic needs to be repeated several times. To solve this issue, NiFi provides the concept of a Template. A Template is a way of combining these basic building blocks into larger building blocks. Once a DataFlow has been created, parts of it can be formed into a Template. This Template can then be dragged onto the canvas, or can be exported as an XML file and shared with others. Templates received from others can then be imported into an instance of NiFi and dragged onto the canvas.

For more information on Templates, including how to import, export, and work with them, please see the Template Section of the User Guide

Here, we have a collection of useful templates for learning about how to build DataFlows with the existing Processors. Please feel free to add any useful templates below.

TemplateDescriptionMinimum NiFi VersionProcessors Used
Pull_from_Twitter_Garden_Hose.xmlThis flow pulls from Twitter using the garden hose setting; it pulls out some basic attributes from the Json and then routes only those items that are actually tweets.  
Retry_Count_Loop.xmlThis process group can be used to maintain a count of how many times a flowfile goes through it. If it reaches some configured threshold it will route to a 'Limit Exceeded' relationship otherwise it will route to 'retry'. Great for processes which you only want to run X number of times before you give up.  
simple-httpget-route.template.xmlPulls from a web service (example is nifi itself), extracts text from a specific section, makes a routing decision on that extracted value, prepares to write to disk using PutFile.  
InvokeHttp_And_Route_Original_On_Status.xmlThis flow demonstrates how to call an HTTP service based on an incoming FlowFile, and route the original FlowFile based on the status code returned from the invocation. In this example, every 30 seconds a FlowFile is produced, an attribute is added to the FlowFile that sets q=nifi, the google.com is invoked for that FlowFile, and any response with a 200 is routed to a relationship called 200.  
Decompression_Circular_Flow.xmlThis flow demonstrates taking an archive that is created with several levels of compression and then continuously decompressing it using a loop until the archived file is extracted out.  

SplitRouteMerge.xml

sample-input.txt

This flow demonstrates splitting a file on line boundaries, routing the splits based on a regex in the content, merging the less important files together for storage somewhere, and sending the higher priority files down another path to take immediate action.  
TwitterSolr.xml

This flow shows how to index tweets with Solr using NiFi. Pre-requisites for this flow are NiFi 0.3.0 or later, the creation of a Twitter application, and a running instance of Solr 5.1 or later with a tweets collection:

./bin/solr start -c
./bin/solr create_collection -c tweets -d data_driven_schema_configs -shards 1 -replicationFactor 1
  
CsvToJSON.xmlThis flow shows how to convert a CSV entry to a JSON document using ExtractText and ReplaceText.  
NetworkActvityExample.xmlThis flow grabs network activity using tcpdump, then performs geo-enrichment if possible, before delivering the tcpdump entries to Kafka and HDFS.  
SyslogExample.xmlThis flow shows how to send and receive messages from Syslog. It requires a Syslog server to be accepting incoming connections using the protocol and port specified in PutSyslog, and forwarding connections using the protocol and port specified in ListenSyslog. NOTE: This template can be used with the latest code from master, or when 0.4.0 is released0.4.0PutSyslog, ListenSyslog
Working_With_CSV.xml

This flow uses http://randomuser.me to generate random data about people in CSV format. It then manipulates the data and writes it to a directory.

A second flow then uses ListFile / FetchFile processors to pull that data into the flow, strip off the CSV header line, and groups the data into separate FlowFiles based on the first column of each row in the CSV file (the "gender" column) and finally puts all of the data to Apache Kafka, using the gender as part of the name of the topic.

0.4.0ListFile, FetchFile, PutKafka, RouteText, PutFile, ReplaceText, InvokeHTTP
Working_with_Logs.xml

Tails the nifi-app and nifi-user log files, and then uses Site-to-Site to push out any changes to those logs to remote instance of NiFi (this template pushes them to localhost so that it is reusable).

A second flow then exposes Input Ports to receive the log data via Site-to-Site. Then data is then aggregated until the data for a single log is in the range of 64-128 MB or 5 minutes passes, which occurs first. The aggregated log data is then pushed to a directory in HDFS, based on the current timestamp and the type of log file (e.g., pushed to /data/logs/nifi-app-logs/2015/12/03 or /data/logs/nifi-user-logs/2015/12/03, depending on the type of data).

NOTE: In order to use this template Site-to-Site must be enabled on the node. To do this, open the $NIFI_HOME/conf/nifi.properties file and set the "nifi.remote.input.socket.port" property to some open port number and set "nifi.remote.input.secure" to "false" (unless, of course, you are running in a secure environment). For more information on Site-to-Site, see the Site-to-Site Section of the User Guide.

0.4.0TailFile, MergeContent, PutHDFS, UpdateAttribute, Site-to-Site, Remote Process Group, Input Ports

 

http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
  • No labels