Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. If we are loading the prior result in a stateless operation, there will be a significant performance impact. NOTE: However, this can be subject to change: if we can generate some integer id (such id generation would likely fall on the user for implementation) which can reflect changes in the processed result, then we might be able to extend dropping no-ops to stateless operations as well. This is still an alternative worth considering.
  2. The other reason is that if we are loading a prior result in its entirety for a stateless operation, we are essentially replicating some functions of a stateful operator into a stateless one. After all, a stateless operator was never intended to load a prior result (only a stateful operator should do such a thing). That means there would be some redundancy between stateful and stateless operators. However, this discrepancy (stateless operations don't drop no-ops while stateful operations do) can result in much confusion from user.

Alternative

...

Approaches

There is a possibility where we can support emit-on-change for all operations.

There is more than one way to yield the prior result. After all, we can obtain it from an upstream processor. For most operations, we can forward downstream both the old and new results of the upstream processor. In this case, the same operation will be performed twice. However, each operation can be very expensive. Performing it twice will in other words has the potential to incur horrendous performance hits. It might be that this is not a serious issue, but it is of significant concern

The main bottleneck for emit-on-change for stateless operations is really how to load the prior result. That does not necessarily have to be done. Earlier in our discussion, we have talked about using a hash code as a way of uniquely identifying the results, and then comparing those hash codes. But as noted, hash codes can vary across JVMs, and it is not a requirement that the programs on different runs return the same hash code for the exact same object. Instead, we can consider using some method distinct from Object#hashCode(). There is the possibility here that we can add a configuration for allowing emit-on-change for stateless operations. If emit-on-change is enabled, then we can use some method defined by the user i.e. generateUniqueObjectId(V result)  returning a 32-bit or 64-bit integer as an id – this method which will have stricter constraints than a normal hash code functionThis method would be used as is the hash codes described in the Implementation section below. We store these ids instead, and compare these for equality.

This potentially can work, but the user must implement the provided method correctly. This must be stressed in further documentation.

Implementation [DISCARDED]

...