Usability issues

Exceptions are handed to ResultCollector.addResult, causing ClassCastExceptions

On the ResultSender, there is a method to return exceptions using sendException.

However, the ResultCollector only has addResult(). Further, ResultCollector has generic types

public interface ResultCollector<T,S> {
  public void addResult(DistributedMember memberID, T resultOfSingleExecution);

Since the type of T is the user result type, trying to call add result will almost invariably result in the ClassCastException, like below

[fatal 2015/12/03 10:58:20.809 PST <Function Execution Processor1> tid=Function Execution Processor1id] Unexpected exception during function execution on local node Partitioned Region
java.lang.ClassCastException: java.lang.Exception cannot be cast to com.gemstone.gemfire.cache.lucene.internal.distributed.TopEntriesCollector
    at com.gemstone.gemfire.cache.lucene.internal.distributed.TopEntriesFunctionCollector.addResult(TopEntriesFunctionCollector.java:1)
    at com.gemstone.gemfire.internal.cache.execute.LocalResultCollectorImpl.addResult(LocalResultCollectorImpl.java:85)
    at com.gemstone.gemfire.internal.cache.execute.PartitionedRegionFunctionResultSender.lastResult(PartitionedRegionFunctionResultSender.java:216)
    at com.gemstone.gemfire.internal.cache.execute.PartitionedRegionFunctionResultSender.lastResult(PartitionedRegionFunctionResultSender.java:174)
    at com.gemstone.gemfire.internal.cache.execute.PartitionedRegionFunctionResultSender.sendException(PartitionedRegionFunctionResultSender.java:309)
    at com.gemstone.gemfire.cache.lucene.internal.distributed.LuceneFunction.execute(LuceneFunction.java:60)
    at com.gemstone.gemfire.internal.cache.execute.AbstractExecution.executeFunctionLocally(AbstractExecution.java:367)
    at com.gemstone.gemfire.internal.cache.

Must use ResultCollector returned from FunctionService.execute

Below, we have to use the collector returned by FunctionService.execute, rather than just calling getResult on our own result collector. We run into problems otherwise, maybe because exceptions are not handed to the user supplied result collector? In any case, its unintuitive that this returns a different ResultCollector than the one supplied by the user.

Example:

  ResultCollector<TopEntriesCollector, TopEntries> rc = (ResultCollector<TopEntriesCollector, TopEntries>) FunctionService.onRegion(region)
        .withArgs(context)
        .withCollector(collector)
        .execute(LuceneFunction.ID);
    
//This doesn't work
TopEntries entries = collector.getResult()

//This is what you have to do
TopEntries entries = rc.getResult();

Function invoked with onMember cannot be unit tested, because our API requires using a singleton cache

FunctionContext does not contain a cache. That means any function that you execute with onMember must contain code like this:

public void execute(FunctionContext context) {
    Cache cache = CacheFactory.getAnyInstance();
//...do work with the cache
}

This makes it impossible to write pure unit tests for the function code that mock the cache.

Functions invoked with onRegion always have to cast the FunctionContext to RegionFunctionContext

Any function that is executed on a region has to do a cast. This is not an intuitive API; the user has to know that there is such a thing as RegionFunctionContext. This can also lead to ClassCastExceptions if the user tries to cast the FunctionContext and it is not actually a REgionFunctionContext

All onRegion function code basically has to start this way:

public void execute(FunctionContext context) {
    RegionFunctionContext ctx = (RegionFunctionContext) context;

Getting the local data set of a function is not intuitive, uses static functions that expect concrete objects

If you want to operate on the local data set for a function, you have to do this

   public void execute(FunctionContext context) {
      RegionFunctionContext ctx = (RegionFunctionContext) context;
      //This actually doesn't give you the local data set
      Region wholePartitionedRegion = ctx.getDataSet();

      //This does, using a static function call 
      Region localData = PartitionRegionHelper.getLocalDataForContext(ctx);

      //This is a bad idea, because it may include some buckets that are also being processed on other nodes
      Region localDataDontDoThis = PartitionRegionHelper.getLocalData(ctx.getDataSet());

There are several problems with this approach.

It's not clear that ctx.getDataSet does not return the local data.
The local data set is not mockable for unit tests, because this static function call extracts it using internal APIs on concrete classes. This also prevents this sort of function from being tested in a pure unit test.
The user has to know that PartitionRegionHelper exists; the API is not obvious
The various methods on PartitionRegionHelper are confusing, specifically it's unclear to a user why they should use getLocalDataForContext instead of getLocalData

No way to get a spring data repository of the local data set of a function

This is not specifically an issue in the geode codebase, but it affects a lot of geode users.

Spring Gemfire provides an simplified function API. Spring Data Gemfire allows the user to generate a repository object to access a region. However, there is no way to generate a repository that accesses the local data for the function.

Here's an example

public interface PostRepository extends GemfireRepository<Post, PostId> {
  @Query("select * from /posts where id.person=$1")
  public Collection<Post> findPosts(String personName);
}  

public class Functions {
  
  //This repository accesses the entire data for the region on all members.
  @Autowired
  PostRepository postRepository;
  
  @GemfireFunction(HA=true)
  public SentimentResult getSentimentDoesntWork(Region<PostId, Post> localPosts, 
      @Filter Set<String> personNames) throws Exception {
    String personName = personNames.iterator().next();
    
    //This works, because localPosts is bound to the local data for the region    
    Collection<Post> posts = localPosts.query("id.person='" + personName + "'");

    //But I really want to use spring data repositories. Unfortunately, 
    //I can't get a repository here
    PostRepository localDataRepository == ??
    Collection<Post> posts  = localDataRepository.findPosts(personName);
    
  }

FunctionService.onMembers relies on a singleton DistributedSystem

FunctionService.onMembers doesn't take a Cache or DistributedSystem as a parameter. That means we can't get rid of the singleton DistributedSystem, because FunctionService will always look up the singleton.

Having a static FunctionService.onXXX means users can't mock the functionService

To invoke a function, the user code needs to refer to the static FunctionService.onXXX methods ( see below). That means that code that contains these static calls can't be tested with a test that mocks the function service.

FunctionService.onRegion(region)
        .withArgs(context)
        .withCollector(collector)
        .execute(LuceneFunction.ID);

It might be better to have a cache.getFunctionService

Functions are not easy to implement for the common case of having one result

The function context has a ResultSender with methods like sendResult, lastResult, sendException, etc. Unfortunately, that makes the common case of a function that returns a single result unnecessarily verbose and error prone. For example, if no result is sent that can cause issues:

public void execute(FunctionContext context) {
  if(someTest) {
	  context.getResultSender().lastResult("Done");
  }
  //Whoops I forgot to send a last result
}

Function code must exist on the client

The function service API provides a way to invoke functions by id, for example

FunctionService.onServer().execute("MY_FUNCTION_ID")

However, internally, we look up the function object itself from the function service in order to read the isHA, optimizeForWrite, and hasResult properties of the function. That means the code must exist on the client and must be registered with the FunctionService.

This is also an opportunity for the user to see an exception if the class on the client does not match the class on the server for these properties.

Lambdas and functions

Lambdas cannot be used for Geode functions, because there are multiple non abstract methods on function like hasResult, isHA, etc.

It would be nice to able to write short functions as lambda expressions, For example

FunctionService.onMembers().execute( () -> System.out.println() );

However, that is not possible currently because Function has several non default methods.

We can also get rid of FunctionAdapter in favor of using default methods on Function.

Suggested API changes

Instead of a single Function interface, there should be multiple interfaces for different types of functions. TODO - Do we really need VoidFunction (equivalent to hasResult() == false)

interface RegionFunction<T> {
  public T execute(RegionFunctionContext context)
}

interface Function<T> {
  public T execute(FunctionContext context)
}

RegionFunctionContext should have a method to the local data for the context

RegionFunctionContext {
  public Region getLocalData();
}

FunctionContext should have a method to return the cache

interface FunctionContext {
  getCache();
}

isHA, optimizeForWrite, and hasResult should be removed from Function and should be added to Execution instead

ResultCollector should have an addException method

public interface ResultCollector<T,S> {
  public void addException(DistributedMember member, Throwable t);
...
}

Example of the improved API

FunctionService functionService = cache.getFunctionService();

//RegionFunction with a single result
ResultCollector<String, Collection<String>> rc = functionService.onRegion(region)
  .execute(ctx -> ctx.getLocalData().get("key"))
Collection<String> values = rc.getResult()

//RegionFunction without a result, that is optimized for write, but is not HA
functionService.onRegion(region)
  .optimizeForWrite(true)
  .isHA(true)
  .hasResult(false)
  .execute(ctx -> ctx.getLocalData().put("key", "value))

//On member function that gets the cache from the context
functionService.onMembers()
   .hasResult(false)
   .execute(ctx -> ctx.getCache().close());

//Function that streams results back using the result sender
functionService.onRegion(region).execute( ctx -> 
  {
    ResultSender<String> s = ctx.getResultSender();
    for(String value : ctx.getRegion().values()) {
      s.addResult(value)
    }
    return null;
  }

Space shortcuts

Page tree

Usability issues

Exceptions are handed to ResultCollector.addResult, causing ClassCastExceptions

Must use ResultCollector returned from FunctionService.execute

Function invoked with onMember cannot be unit tested, because our API requires using a singleton cache

Functions invoked with onRegion always have to cast the FunctionContext to RegionFunctionContext

Getting the local data set of a function is not intuitive, uses static functions that expect concrete objects

No way to get a spring data repository of the local data set of a function

FunctionService.onMembers relies on a singleton DistributedSystem

Having a static FunctionService.onXXX means users can't mock the functionService

Functions are not easy to implement for the common case of having one result

Function code must exist on the client

Lambdas and functions

Suggested API changes

Space shortcuts

Page tree

Function Service Usability Improvments

Usability issues

Exceptions are handed to ResultCollector.addResult, causing ClassCastExceptions

Must use ResultCollector returned from FunctionService.execute

Function invoked with onMember cannot be unit tested, because our API requires using a singleton cache

Functions invoked with onRegion always have to cast the FunctionContext to RegionFunctionContext

Getting the local data set of a function is not intuitive, uses static functions that expect concrete objects

No way to get a spring data repository of the local data set of a function

FunctionService.onMembers relies on a singleton DistributedSystem

Having a static FunctionService.onXXX means users can't mock the functionService

Functions are not easy to implement for the common case of having one result

Function code must exist on the client

Lambdas and functions

Suggested API changes