Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

#TitleNotes
1UI: Provide ability to configure how many retries per Relationship
  • User may want to retry 5 times for a 'failure' relationship. Or 10. Or 0. It should be up to the user to decide how many times a Relationship should be retried.
2UI: Provide ability to configure backoff mechanism
  • When a FlowFile is to be retried, the user needs to be able to dictate the backoff policy. User should be given the option of penalizing the FlowFile that is to be retried (this should be the default) or Yielding the entire Processor.
    • There are cases where we want to maintain the ordering of the FlowFiles, even during failures, and penalizing a FlowFile can result in the ordering being changed. As a result, we need the ability to configure the Processor to Yield when retrying. In this case, no FlowFiles will be processed until the first flowfile is ready to be retried.
3UI: Provide ability to configure Max Backoff PeriodWhen a FlowFile is penalized or a Processor yielded, we should wait double the last retry period (i.e., use an exponential backoff) up to some configurable max amount of time. User must be able to specify the max amount of time.
4UI: Introduce new tab in Processor config dialog for Relationships

Currently, a user configures which Relationships should be auto-terminated in the Settings tab of the configuration dialog. We now will have more complex configuration for each Relationship and as such need to a new tab in this dialog. The ability to configure which relationships are auto-terminated should be removed from the Settings tab and added to this new Relationships tab.

This tab should list all Relationships for the Processor and allow the user to configure whether or not to auto-terminate the Relationship, as it does now. It should also allow configuring how many times to retry whether or not FlowFiles routed to the a given Relationship should be retried.

It will be valid to configure both a specific number of retries AND auto-terminating the Relationship. Or just one or the other.

This tab should also contain the Max Backoff Period and Backoff Mechanism outlined above.

5Update DTO Data Model

The ProcessorConfigDTO data model must be updated to include the new logic. This should involve updating this DTO to have a new RetryConfigDTO. This new RetryConfigDTO will have The DTO will need the following members added:

Map<String, Integer> retryCountsSet<String> retriedRelationships; // mapping The name of any Relationship name to number of times to retrythat is to be retried.

String backoffMechanism; // Should be an enum but currently the DTO's do not make use of enums and instead use String objects with an Allowable Values specified in the @ApiModelProperty annotation. Best to stay consistent.

String maxBackoffPeriod;


This means we will also need to ensure that we update the DtoFactory class to properly populate these values when creating the DTO.

6Update ProcessorResource to validate argumentsThe new retriedRelationships, backoffMechanism, and maxBackoffPeriod elements should have their values validated when attempting to perform a POST or a PUT to a Processor. The request should throw an IllegalArgumentException if one of the values is invalid. For example, if the max backoff period were set to "5" instead of "5 mins"
7Update Flow Data ModelUpdate the serialization logic to persist these new configurations on the Processor and to handle the deserialization logic. Must also update the flow-configuration.xsd schema
78Update Versioned Processor Config Data ModelUpdate the VersionedProcessor in much the same way that we update the DTO so that changes to the processor are tracked in registry, exported versions, etc.
89Update Flow MapperUpdate the FlowMapper to ensure that the new fields in the VersionedProcessor are populated when mapping a ProcessorNode to a VersionedProcessor
910Update Versioned Flow ComparatorUpdate VersionedFlowComparator so that if the configuration for retry logic is updated, it's recognized as a local change
1011Update Flow FingerprintUpdate Flow Fingerprint to ensure that nodes within the cluster have the same configuration values
1112Update ProcessorNodeUpdate ProcessorNode to hold the new configuration elements for retry logic configuration in much the same way as the DTO data model. However, for the ProcessorNode, it will be important that an enum be used for the backoff mechanism.
1213Update ProcessContextUpdate ProcessContext so that processor developers have the ability to determine whether or not a given Relationship is configured for retrying and how many times it is to be retried
1314Update User GuideA new section must be added to the User Guide explaining the new tab and how to configure the different options. Additionally, any screenshots that show the Processor configuration dialog are now outdated and should be updated.
1415Implement the Retry LogicA description of the Implementation Logic is given below.
16Update FileSystemSwapManagerUpdate the FileSystemSwapManager to ensure that the data that is written out indicates the number of retries for a given FlowFile. Ensure that swap files written in the old format, which does not include this information, are still readable.

Testing

This is a major feature, which touches critical parts of the framework. The following testing needs to be done, at a minimum, in order to consider the feature a success.

...

  • If a FlowFile is routed to a given Relationship, before considering if it is auto-terminated or if the relationship is cloned, etc., we must check if the Relationship is configured for Retry Logic.
    • If so, we must check how many times the FlowFile has been retried. This means that we will have to add a new field to the FlowFileRecord object (and therefore the StandardFlowFileRecord object). This field can simply be an `int numRetries`. There is no need to add a mapping of Relationship name to number of retries. This should instead be kept simple by saying after the FlowFile has been retried N number of times, it's done retrying (based on the relationship it was routed to). For example, if relationship relationships 'ABC' is to be retried 10 times and relationship 'XYZ' is are to be retried 5 7 times, consider that the Processor routes FlowFile 1 to relationship 'ABC' 7 times and then to relationship 'XYZ'. At this point, it is the 8th processing attempt, which is greater than 57, so it should be routed to relationship 'XYZ', not retried again. Note that the number of retries is 'transient' - it should not be serialized to the FlowFile Repository. On restart of NiFi, the number of retries can reset to 0. However, it MUST be persisted to/restored from Swap Files. Otherwise, once a queue reaches a certain threshold, the retries may no longer work.
    • If the FlowFile has not yet reached the threshold for retries (i.e., it must be retried again), the Process Session must:
      • Transfer the FlowFile back to its original queue (if there is one, else remove the FlowFile). The FlowFile should be penalized, if the Retry Logic is configured to do so. Otherwise, the Processor must be yielded.
      • Any FlowFile that was created as a Child of this FlowFile must be removed. It will be very important here that we ensure that the logic is correct for cleaning up the Content Repository!!
      • Any Provenance Event for this FlowFile (FlowFile that is a Child of this FlowFile) must be removed.
    • We should still update the states for the number of bytes Read/Written, the number of Tasks/Time. We should not update the number of FlowFiles/Bytes In/Out.
  • If Relationship that FlowFile is logic is not configured for retry logic, OR if the configured number of retries has been reached, we should process the FlowFile as we normally would.

...