DSF ID Loader

To be Reviewed By: October 8, 2021

Authors: Jens Deppe

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem

To facilitate backwards compatibility, Geode has a concept known as Data Serializable Fixed ID (DSFID) which allows serializable objects to evolve over time using an API that allows newer versions of the classes to still be deserialized by older versions of Geode. This is a core capability of Geode that is primarily intended for internal classes but is also utilized by various modules, notably WAN, Lucene and Redis.

To use DSFIDs, a class needs to register itself so that it is known by the serialization framework. Many core classes register in DSFIDFactory . The modules, mentioned before, register during their service loader discovery and initialization phase. Under some situations, however, Geode will require knowledge of the DSFID very early on during member startup. If a member receives a message with an unknown DSFID it will throw a DSFIDNotFoundException. Without introducing module circular dependencies, there is no way for a module to register its required DSFIDs early enough to avoid this issue.

Anti-Goals

None identified.

Solution

The proposed solution has 2 parts:

Currently, the DSFIDSerializer  interface exposes a registerDSFID(int dsfid, Class<?> dsfidClass) method. This method will be moved into a new interface ( DataSerializerFixedIdRegistrant ) and DSFIDSerialzer  will extend this interface:

DataSerializableFixedIdRegistry
interface DataSerializeableFixedIdRegistry {
  register(int id, Class<? extends DataSerializableFixedId> clazz);
}

This will reduce the API surface area required by the following new service provider interface which would perform the actual Fixed Id registration:


DataSerializableRegistrant
interface DataSerializeableFixedIdRegistrant {
  register(DataSerializableFixedIdRegistry registry);
}


This SPI will be initialized as part of the static class initialization already existing in InternalDataSerializer .

In order to use this, a module will need to:

  • Include a provider configuration resource file: resources/META-INF/services/org.apache.geode.internal.serialization.DataSerializabledFixedIdRegistrant
  • This file will contain one or more fully qualified class names which implement the above interface

The result will be that DSFIDs will be registered and available when the core serialization framework is instantiated.

Error Conditions

Currently, an IllegalArgumentException  is thrown, during registration, if the given class does not have a zero-argument constructor. This should be expanded to also produce an IllegalArgumentException  if a registration is attempted for a previously registered ID.

Changes and Additions to Public Interfaces

No changes to public interfaces.

Performance Impact

No anticipated performance impacts.

Backwards Compatibility and Upgrade Path

No backwards compatibility impact.

Prior Art

N/A

FAQ

None yet.

Errata

None yet.

  • No labels

15 Comments

  1. Nice!  This has always been a sticky point for extending the known set of core data types.

    Would it make sense to include a section on discarded alternatives?  In particular I'm wondering about providing a stronger module lifecycle interface where these kinds of registrations could be handled.  I know we will get there eventually but it could be too big of a step until we have completed the modules support that Udo/Patrick are tackling.

  2. I think DSFIDSerializer is too wide an interface for what DSFIDLoader needs. Could you create a new interface that only has " void registerDSFID(int dsfid, Class<?> dsfidClass);" on it? I think that is all DSFIDLoader needs.

    1. I agree. SPIs should try to limit the surface area exposed outward into the implementation. 

  3. I really want to see us steer away from using acronyms and abbreviations. Let's call this DataSerializableFixedIdLoader. Our coding standard has always supposed to be "Id" instead of "ID", but examples of "ID" slipped into the code over the years. Let's stick to "Id" everywhere we can. We also don't really need to have "DSFID" on everything, so I'd recommend simplifying the callback to "register". Some day (not necessarily part of this change), I really want to rename DSFIDSerializer as well to get rid of "DSFID".

    public interface DataSerializableFixedIdLoader {
      void register(DSFIDSerializer serializer);
    }

  4. Given that this interface isn't actually "loading" anything I feel like the "loader" part of the name is misleading. Something like DfsidRegistrant might be more applicable. 

    1. Or maybe "DataSerializableFixedIdRegistration"?

  5. Perhaps we should avoid the abbreviation of data serializable fixed ID entirely. What about DataSerializeableFixedIdResgistrant

    interface DataSerializeableFixedIdResgistrant {
    register(DataSerializableFixedIdRegistry registry);
    }
    interface DataSerializeableFixedIdRegistry {
    register(Class<? extends DataSerializableFixedId> clazz);
    }

    We can read the fixed ID from a DataSerializableFixedId  implementation.


    For that matter we could do away with DataSerializeableFixedIdResgistrant  completely and use DataSerializableFixedId as the loader interface and just invoke the getDsfid() method.

    1. I don't think that will be possible since we're registering classes and not actual instances of those classes (which would allow for calling getDsfid() ).

      1. You are correct.

        I will revise back to my original then:

        interface DataSerializeableFixedIdResgistrant {
        register(DataSerializableFixedIdRegistry registry);
        }

        interface DataSerializeableFixedIdRegistry {
        register(int id, Class<? extends DataSerializableFixedId> clazz);
        }


        1. Spitballing... We could eliminate the the potential disconnect between the DataSerializableFixedId  classes ID and the ID the DataSerializableFixedIdRegistrant  knows about by using annotations. An annotation on the DataSerialiazableFixedId implementation could provide the ID without the need to instantiate it. 

  6. We should also define rules around how duplicate registration attempts should behave. I think they should throw an appropriate exception unique to duplication. Should such an event be fatal?

    We should also define what should happen in the classes is not DataSerializableFixedId .

  7. This idea seems pretty reasonable, and shouldn't be an issue with the work Udo and I are doing around modularity. We'll just have to treat geode-serialization similarly to how we treat modules that contain implementations of CacheService.


  8. I've updated the RFC with the list of suggestions made up to this point.

  9. On the high-level the RFC looks ok. But I have some questions about the implementation:

    1.  Is the registration of DSFID's going to be part of the CacheService  initialization step or is this another step that needs to happen in addition to the initialization of the CacheService ?
      1. if the registration is NOT part of the CacheService initialization, what happens if either of these fail?
      2. if the registration IS part of the CacheService  initialization, would it make sense to have the DataSerializable?
    2. Can we remove the existing DSFIDs for non-core Geode product extensions (redis,lucene,...) from the DataSerializableFixedID interface? The core system should not know about non-core extensions (like Lucene, Redis, Memcache) fixed IDs. These should ONLY be specified within the extension itself and provided to the registry service.
    3. Given the, hopefully soon, enabling of runtime loading/unloading of plugins/extensions (like the redis work, memcache or any other non-core product capability), how can we make the registration of DSFIDs less hardcoded and more cluster dynamic? (most likely not part of this scope, but it does not hurt to think about it). It would be great if you have a single API of register(Class<? extends DataSerializableFixedId> clazz) where the cluster assigns and allocates the fixed IDs. One could still have "core" operations have fixed IDs, but for non-core extensions, the allocation should be dynamic. For example, how to handle a situation where the Memcache and Redis integrations happen to choose to use the same ID ranges.

    I know that #3 is a stretch goal, but we HAVE to resolve #1 and #2 for this RFC to be viable at all.

    1. Thanks for your comments Udo. In response:

      1. Registration is not part of CacheService  initialization but will happen statically as part of the JVM startup - tied into the static initialization of {{InternalDataSerializer}}. Currently, any registration failures will cause the cache not to start up.
      2. The registration is already separated out and contained within each module. (I'm not sure if this is what you're referring to). Also, see the in-flight PR to implement this PR (https://github.com/apache/geode/pull/6891). Notice that WAN, Lucene and Redis all have separate registration of DSFIDs.
      3. I don't think that dynamic Ids will work - the whole point of fixed Ids is to allow for serialization evolution of objects between versions of Geode. For rolling upgrades I don't think these could be dynamic. It would probably be possible but would require much more work and some kind of registry; a bit like PDX types.