You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

This proposal contains a number of improvements to the function service to improve usability issues.

 

Improvements

Minor changes

There are some minor improvements that already have JIRAs. We should just apply the changes described in these jiras.

Unable to render Jira issues macro, execution error.

Unable to render Jira issues macro, execution error.

Unable to render Jira issues macro, execution error.

Unable to render Jira issues macro, execution error.

 

Functions executed on a region should implement a different interface

Any function that is executed on a region has to do a cast:

public void execute(FunctionContext context) {
    RegionFunctionContext ctx = (RegionFunctionContext) context;
 

This is not an intuitive API; the user has to know that there is such a thing as RegionFunctionContext. This can also lead to ClassCastExceptions if the user tries to cast the FunctionContext and it is not actually a RegionFunctionContext

 

Instead, we should have a new RegionFunction interface that provides a RegionFunctionContext. FunctionService.onRegion should return a RegionExecution interface who's execute method should expect a RegionFunction.

See Unable to render Jira issues macro, execution error. for more details

 

RegionFunctionContext should provide the local data set

If you want to operate on the local data set for a function, you have to do this

   public void execute(FunctionContext context) {
      RegionFunctionContext ctx = (RegionFunctionContext) context;
      //This actually doesn't give you the local data set
      Region wholePartitionedRegion = ctx.getDataSet();

      //This does, using a static function call 
      Region localData = PartitionRegionHelper.getLocalDataForContext(ctx);

      //This is a bad idea, because it may include some buckets that are also being processed on other nodes
      Region localDataDontDoThis = PartitionRegionHelper.getLocalData(ctx.getDataSet());

 

There are several problems with this approach.

  • It's not clear that ctx.getDataSet does not return the local data.
  • The local data set is not mockable for unit tests, because this static function call extracts it using internal APIs on concrete classes. This also prevents this sort of function from being tested in a pure unit test.
  • The user has to know that PartitionRegionHelper exists; the API is not obvious
  • The various methods on PartitionRegionHelper are confusing, specifically it's unclear to a user why they should use getLocalDataForContext instead of getLocalData

 

Instead, RegionFunctionContext should just provide a getLocalDataSet method.

The methods isHA, hasResult, optimizeForWrite should move from the Function interface to the Execution interface

isHA, hasResult, and optimizeForWrite are methods on Function that the user can override. However, that makes it harder to write lambda expressions for functions, because if you want to set one of these parameters you can now longer use a lambda.

Also the function service API provides a way to invoke functions using a string id. For example FunctionService.onServer().execute("MY_FUNCTION_ID"). If we remove these extra methods from the Function interface, this code will no longer need to look up the Function on the client and the Function class will not have to exist on the client anymore.

 

The Function interface should return a value

The function context has a ResultSender with methods like sendResult, lastResult, sendException, etc. Unfortunately, that makes the common case of a function that returns a single result unnecessarily verbose and error prone. For example, if no result is sent that can cause issues:

public void execute(FunctionContext context) {
  if(someTest) {
	  context.getResultSender().lastResult("Done");
  }
  //Whoops I forgot to send a last result
}

 

Instead, the function interfaces should return a value. For the simple case where there is only one result, the user can just return the result. For cases with multiple streaming results, the user can look up the result sender. For cases with no results the user can just return null. The return value will be ignored anyway if the user invokes the function with hasResult==false.

interface RegionFunction<T> {
  public T execute(RegionFunctionContext context)
}

interface Function<T> {
  public T execute(FunctionContext context)
}

Add Cache.getFunctionService and deprecate the static methods on FunctionService

This proposal might be a bit more controversial. The issue is that the current FunctionService leads people to write code that is harder to test, because they cannot mock the FunctionService if they are making static calls to FunctionService.onRegion. If FunctionService is something that could be provided by the cache, a user can mock FunctionService and assert that functions are invoked.

The other issue this fixes is that FunctionService has methods like onMembers(String ... groups) that internally look up a static cache. Besides making it hard for users to use Mocks with FunctionService code, this also ties us to having a singleton cache and distributed system. Since we're trying to get rid of singletons we should fix this API to not rely on an underlying singleton.

Stop Requiring the user to use the ResultCollector returned by the execute method.

The javadocs for execute say this

/** @return ResultCollector to retrieve the results received. This is different
   *         object than the ResultCollector provided in
   *         {@link Execution#withCollector(ResultCollector)}. User has to use
   *         this reference to retrieve results.
*/

It's confusing to users that geode wraps their result collector in another collector that has different behavior, and it's unclear why the user has to call the returned result collector. We should stop wrapping the user's result collector, and instead provide helper classes like a BlockingResultCollector that waits for all results to arrive in getResults.

Below, we have to use the collector returned by FunctionService.execute, rather than just calling getResult on our own result collector. We run into problems otherwise, maybe because exceptions are not handed to the user supplied result collector? In any case, its unintuitive that this returns a different ResultCollector than the one supplied by the user.

 

Example:

  ResultCollector<TopEntriesCollector, TopEntries> rc = (ResultCollector<TopEntriesCollector, TopEntries>) FunctionService.onRegion(region)
        .withArgs(context)
        .withCollector(collector)
        .execute(LuceneFunction.ID);
    
//You're not supposed to do this
TopEntries entries = collector.getResult()

//This is what you are supposed to do. rc != collector
TopEntries entries = rc.getResult();
  

Examples of API changes

 

FunctionService functionService = cache.getFunctionService();

//RegionFunction with a single result
ResultCollector<String, Collection<String>> rc = functionService.onRegion(region)
  .execute(ctx -> ctx.getLocalData().get("key"))
Collection<String> values = rc.getResult()

//RegionFunction without a result, that is optimized for write, but is not HA
functionService.onRegion(region)
  .optimizeForWrite(true)
  .isHA(true)
  .hasResult(false)
  .execute(ctx -> ctx.getLocalData().put("key", "value))

//On member function that gets the cache from the context
functionService.onMembers()
   .hasResult(false)
   .execute(ctx -> ctx.getCache().close());

//Function that streams results back using the result sender
functionService.onRegion(region).execute( ctx -> 
  {
    ResultSender<String> s = ctx.getResultSender();
    for(String value : ctx.getRegion().values()) {
      s.addResult(value)
    }
    return null;
  }
  • No labels