Q: Where can I find information about upgrading to a new NiFi version? - A: The Upgrading NiFi guide provides steps for upgrading NiFi along with suggested NiFi installation configurations that make upgrading even easier. We also offer Migration Guidance. Additionally you can find details on upgrading in the System Properties section of the Administrator Guide.
Q: Where can I find information about the REST API? - A: The REST API documentation is included in the "help" documentation within the application and also on our web site here. To get to the documentation within the application, click on the "help" link in the upper-right corner of the NiFi user interface. Then, in the pane on the left-hand side, scroll down to the very bottom, where you will see a Developer section, with links to the Developer Guide and the REST API documentation.
Q: What is the base endpoint for the NiFi REST API? - A: The base endpoint is http://[server]:8080/nifi-api for default settings. If you adjust the conf/nifi.properties, then these values may differ.
Q: How do I select multiple items on the graph at the same time, such as if I want to select and move a group of processors? - A: You can either select one item, and then hold down the Shift key and select other items, or you can click anywhere on the blank graph, outside what you want to select, then hold down the shift key and drag a selection box around what you want to select.
Q: How do I set up NiFi to run behind a proxy? - A: There are a couple of key items to know when doing this.
1) NiFi is comprised of a number of web applications (web UI, web API, documentation, custom UI's, etc). So, you'll need to set up your mapping to the root path. That way, all context paths are passed through accordingly. For instance, if you only mapped the /nifi context path, the custom UI for the UpdateAttribute processor will not work, since it's available at /update-attribute-ui-<version>. 2) NiFi's REST API will generate URIs for each component on the graph. Since you are coming through a proxy, you'll need to override certain elements of the URIs being generated. You can override the elements of the URI by adding the following HTTP headers when your proxy generates the HTTP request to the NiFi instance: X-ProxyScheme - the scheme to use to connect to your proxy (https in this case) X-ProxyHost - the host of your proxy X-ProxyPort - the port your proxy is listening on X-ProxyContextPath - the path you've configured to map to the NiFi instance
Q: Am I correct in assuming that I can transit large volumes of data through NiFi flows in and out of Hadoop? A: Yes, you are correct that large payloads can be moved through NiFi. As data moves through NiFi, a pointer to the data is being passed around, referred to as a FlowFile. The content of the FlowFile is only accessed as needed. The key for large payloads would be to operate on the payload in a streaming fashion so that you don't read too many large payloads into memory and exceed your JVM memory. As an example, a typical pattern for bringing data into HDFS from NiFi, is to use a MergeContent processor right before a PutHDFS processor. MergeContent can take many small/medium size files and merge them together to form an appropriate size file for HDFS. It does this by copying all of the input streams from the original files to a new output stream, and can, therefore, merge a large amount of files without exceeding the memory of the JVM.
Q: What happens to my data if there is a power loss and the system goes down? - A: NiFi stores the data in the repository as it is traversing through the system. There are three key repositories: The FlowFile Repository, the Content Repository, and the Provenance Repository. As a Processor writes data to a flowfile, that is streamed directly to the content repository. When the processor finishes, it commits the session (essentially marks a transaction as complete). This triggers the Provenance Repository to be updated to include the events that occurred for that processor and then the FlowFile repository is then updated to keep track of where in the flow the FlowFile is. Finally, the FlowFile can be moved to the next queue in the flow. This way, if power is lost at any point, NiFi is able to resume where it left off.This, however, glosses over one detail, which is that by default when we update the repositories, we write the information to disk but this is often cached by the operating system. If you truly have a complete loss of power, it is possible to lose those updates to the repository. This can be avoided by configuring the repositories in the nifi.properties file to always sync to disk. This, however, can be a significant hinderance to performance. Simply killing NiFi, though, will not be problematic, as the operating system will still be responsible for flushing that data to the disk.
Q: How do I enable debug logging for a specific processor, rather than system-wide? Q: I want to know how I can package and deploy the same dataflow from a development environment to a testing environment. Do I need to recreate the entire dataflow again in the different environment? - A: The primary mechanism is flow templates. They do have some important limitations that you'll want to understand (at least as of version 0.4.0). First, some component properties are sensitive, like passwords, and thus are not included in the templates. So you'll have to reenter them when you apply the template in the new environment. Second, there are at times properties that you'd want to have different values for in different environments. We need to provide an easy property/environment variable mapping mechanism. Although we intend to address this, it is not actively being worked on yet.
- It is also worth noting that the NiFi Expression Language (EL) can use system user environment variables as well as NiFI JVM properties as input. So you could create templates that have components configured with EL statements that reference these environment variables. That would allow each system running the same dataflow to pull in different values. You can add NiFI JVM properties in the bootstrap.conf file in your NiFi installation. System environment variables are set at the OS level. There is an order of preference to be aware of, however. NiFi will first search for FlowFile Attributes matching the defined subject/key name in the EL statement, then system environment variables, and then JVM properties. See the Expression Language Guide for more information.
Q: At what point is a piece of data considered under NiFi's control? - A: NiFi is said to be under control of data once a FlowFile with its content has been generated in a ProcessSession and that session has been committed. See this section of the Developer Guide for a more detailed overview of this part of the process.
Q: How do I bend connections so that I can create nicer-looking dataflows?
- A: You can add a bend-point (elbow) at any place on a connection by double-clicking the connection at the desired point. Then, simply use the mouse to grab that point on the connection and drag it so that the connection is bent as you desire. You can remove a bend-point by double-clicking it again. You can also move the label on the connection to any bend-point on the connection.
|