Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

2). Network failures between TM and JM that can potentially cause orphan tasks[ (details discussed in Section 2, Point1).

...

  1. ResultSubPartition is kept when the consumer exits (task fails), the consumer only releases the view.
  2. ResultSubPartition is released through ResultPartitionManager on
    1. consumption finished; 
    2. producer failure/cancellation; 
    3. Job exits
  3. Multiple views can be created on the same subpartition, but only one view exists at a time (No two views exist at the same time).

...

Point1: Orphan/Zombie Tasks

It is possible that TM disconnects TM may disconnect to JM, but both TM and JM are alive. In the sink failure case, TM fails the sink nodetask, and JM redeploys the sink to a different TM slot requested from RM.

However, it is possible that JM deploys JM may deploy a new sink before TM successfully fails the old one. We have to make sure only one view exists at any time, and the view is connected with the new attemptId/attemptNumber. In theory, the old execution has the possibility to can connect after the new execution. This could In reality, it can happen, but rarely. If only one view is allowed at a time, creating ResultSubpartitionView for the newer execution attempt should prevent the creation of ResultSubpartitionView from the older execution attempt to succeed.

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior how will we phase out the older behavior?
  • If we need special migration tools, describe them here.
  • When will we remove the existing behavior?

Test Plan

...

Rejected Alternatives

...