Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
The Path class is currently mutable to support IOReadableWritable serialization. However, many parts of the code assume that the Path is immutable. By making the Path class immutable, we can ensure that paths are stored correctly without the
possibility of mutation and eliminate the occurrence of subtle errors.
Public Interfaces
Modify the Path class to no longer implement the IOReadableWritable interface.
Add two static methods to the Path class to support serializing to DataInputView and deserializing from DataOutputView:public static Path deserializeFromDataInputView(DataInputView in) throws IOException
public static void serializeToDataOutputView(Path path, DataOutputView out) throws IOException
Proposed Changes
There are two steps to modify the Path class to no longer implement the IOReadableWritable interface.
First, add the two static methods to the Path class to support serialization and deserialization. Currently, three classes need to serialize/deserialize the Path using the IOReadableWritable interface: FileSourceSplitSerializer/TestManagedSinkCommittableSerializer/TestManagedFileSourceSplitSerializer. Modify these classes to serialize/deserialize the Path using the two static methods instead of IOReadableWritable.
Second, mark the implementation methods of IOReadableWritable in the Path class as deprecated and remove them in the next major release.
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?
For the users who rely on IOReadableWritable to serialize/deserialize the Path will need to migrate to using the serializeToDataOutputView/deserializeFromDataInputView methods in the Path class. This migration shouldn't be difficult.
- As the implementation method of IOReadableWritable in the Path class is part of the public API, we will deprecate the method 2 releases before removing it.