Status

...

Page properties

Discussion thread

...

	https://lists.apache.org/thread/1vokqdnnt01yycl7y1p74g556cc8yvtq

JIRA: TBD

...

Vote thread

JIRA

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	FLINK-28415

Release

1.16

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

This FLIP is a joint work of initially proposed by Yuan Zhu (zstraw@163.com), and finished by Qingsheng Ren (partial caching part) and Alexander Smirnov (full caching part).

Table of Contents

Motivation

As a widely-used feature in Flink SQL jobs, the performance of lookup table source is essential not only for users but also source developers for tuning their implementations. Most lookup table sources use cache to achieve better performance, but there are some features missing in the current design of cache:

...

In order to address the issues above, we propose here to define a unified abstraction for lookup source cache and its related metrics.

Proposed Changes

...

Top-level APIs

In order to clarify the semantic of lookup, we'd like to introduce some top-level APIs for general lookup operations without caching

We'd like to split the proposal into two kinds of caching strategies: LRU cache and all cache.

LRU cache

LRU is the most common caching strategy, which dynamically evicts entries in the cache according to the given configuration. For supporting LRU cache in lookup table, we propose several new interfaces to simplify the work for developers to implement lookup table functions and enable cache:

LookupFunction / AsyncLookupFunction, an extended version of TableFunction to

...

make the API more straight forward.
LookupFunctionProvider / AsyncLookupProvider, serve as the creator of LookupFunction / AsyncLookupFunction in table source

And APIs related to the cache:

LookupCache

...

, defining the cache

...

used in lookup table.

...

DefaultLookupCache a default implementation of a

...

cache that suitable for most use cases.

...

CacheMetricGroup, defining metrics should be reported by the

...

cache.

...

Partial and Full Caching

More specifically, we'd like to provide public interfaces for the most 2 common cases to lookup source developers, which are named as partial and full caching.

Partial caching

Partial caching loads data into the cache along with the access to the external system. If the key to lookup does not exist in the cache, a lookup action to the external system will be triggered and the lookup result will be stored in the cache for further loading. Users and lookup table developers are able to config the eviction policy and maximum size of the cache.

In order to support partial caching, we propose to introduce 2 new interfaces:

PartialCachingLookupProvider / AsyncPartialCachingLookupProvider, as the API interacting with table source to get LookupFunction and LookupCache.

The cache serves as a component in LookupJoinRunner, and would be pluggable by specifying in the constructor of the provider. The planner will take over the lookup function and the cache created from the provider and pass it to the LookupJoinRunner. The cache will be instantiated during the runtime execution and loading operations via lookup function if there's a cache miss.

Full Caching

If the size of lookup table is relatively small to fit into the memory, and the lookup table

The LRU cache serves as a component in LookupJoinRunner, and would be pluggable by specifying LookupCacheFactory in LookupFunctionProvider. The developer of a lookup table needs to define a LookupFunctionProvider / AsyncLookupProvider in their implementation of LookupTableSource to specify the LookupFunction and the factory of the cache, then the planner will take over the cache factory, pass it to the LookupJoinRunner, and the cache will be instantiated during the runtime execution.

All Cache

If the size of lookup table is relatively small to fit into the memory, and the lookup table doesn't change frequently, it'll be more efficient to load all entries of the lookup table into the cache to reduce network I/O, and refresh the table periodically. We'd like to name this use case as "all full cache". Logically the reload operation is a kind of scan, so we'd like to reuse the ScanRuntimeProvider so that developers could reuse the scanning logic implemented in Source / SourceFunction / InputFormat. Considering the complexity of Source API, we'd like to support SourceFunction and InputFormat API first. Supporting Source API might require new topology and will be discussed later in another FLIP.

We propose to introduce a new interface RescanRuntimeProvider in order to reuse the ability of scanningseveral new interfaces:

FullCachingLookupProvider, for reusing the ability of scanning.
CacheReloadTrigger, for customizing reloading strategies of all entries in the full cache.

Also we'd like to provide two default implementations of CacheReloadTrigger:

PeriodicCacheReloadTrigger, for triggering reload periodically with a specific interval
TimedCacheReloadTrigger, for triggering reload at the specific time and repeat with the interval in days.

Public Interfaces

Lookup Functions

As the usage of TableFunction interface is not quite straight forward to lookup table developers, we'd like to introduce a new interface for sync and async lookup tables. Caching will be only supported on LookupFunction / AsyncLookupFunction.

Code Block

language	java
title	LookupFunction

/**
 * A wrapper class of {@link TableFunction} for synchronously lookup rows matching the lookup keys
 * from external system.
 *
 * <p>The output type of this table function is fixed as {@link RowData}.
 */
@PublicEvolving
public abstract class LookupFunction extends TableFunction<RowData> {

    /**
     * Synchronously lookup rows matching the lookup keys.
     *
     * @param keyskeyRow - KeysA to lookup.
     {@link RowData} that wraps keys to lookup.
     * @return A collections of all matching rows in the lookup table.
     */
    public abstract Collection<RowData> lookup(Object... keysRowData keyRow) throws IOException;

    /** Invoke {@link #lookup} and handle exceptions. */
    public final void eval(Object... keys) {
        try {
            lookup(GenericRowData.of(keys)).forEach(this::collect);
        } catch (IOException e) {
            throw new RuntimeException("Failed to lookup values with given key", e);
        }
    }
}

...

Code Block

language	java
title	AsyncLookupFunction

/**
 * A wrapper class of {@link AsyncTableFunction} for asynchronously lookup rows matching the lookup
 * keys from external system.
 *
 * <p>The output type of this table function is fixed as {@link RowData}.
 */
@PublicEvolving
public abstract class AsyncLookupFunction extends AsyncTableFunction<RowData> {

    /**
     * Asynchronously lookup rows matching the lookup keys.
     *
     * @param keyskeyRow - Keys A {@link RowData} that wraps keys to lookup.
     * @return A collections of all matching rows in the lookup table.
     */
    public abstract CompletableFuture<Collection<RowData>> asyncLookup(Object... keysRowData keyRow);

    /** Invokes {@link #asyncLookup} and chains futures. */
    public final void eval(CompletableFuture<Collection<RowData>> future, Object... keys) {
            asyncLookup(GenericRowData.of(keys))
                .whenCompleteAsync(
                        (result, exception) -> {
                            if (exception != null) {
                                future.completeExceptionally(exception);
                                return;
                            }
                            future.complete(result);
                        });
    }
}

LookupCache

Considering there might be custom caching strategies and optimizations, we'd like to expose the cache interface as public API for developers to make the cache pluggable.

LookupFunctionProvider

Code Block

language	java
title	LookupFunctionProvider

/** A provider for creating {@link LookupFunction}. */
@PublicEvolving
public interface LookupFunctionProvider extends LookupTableSource.LookupRuntimeProvider {

    static LookupFunctionProvider of(LookupFunction lookupFunction) {
        return () -> lookupFunction;
    }

    LookupFunction createLookupFunction();
}

AsyncLookupFunctionProvider

Code Block

language	java
title	AsyncLookupFunctionProvider

/** A provider for creating {@link AsyncLookupFunction}. */
@PublicEvolving
public interface AsyncLookupFunctionProvider extends LookupTableSource.LookupRuntimeProvider {

    static AsyncLookupFunctionProvider of(AsyncLookupFunction asyncLookupFunction) {
        return () -> asyncLookupFunction;
    }

    AsyncLookupFunction createAsyncLookupFunction();
}

LookupCache

Considering there might be custom caching strategies and optimizations, we'd like to expose the cache interface as public API for developers to make the cache pluggable.

Code Block

language	java
title	LookupCache

/**
 * A semi-persistent mapping from keys to values for storing entries of lookup table.
 *
 * <p>The type of the caching key is a {@link RowData} with lookup key fields packed inside. The
 * type of value is a {@link Collection} of {@link RowData}, which are rows matching lookup key
 * fields.
 *
 * <p>Cache entries are manually added using {@link #put}, and are stored in the cache until either
 * evicted or manually invalidated.
 *
 * <p>Implementations of this interface are expected to be thread-safe, and can be safely accessed
 * by multiple concurrent threads.
 */
@PublicEvolving
public interface LookupCache extends AutoClosable, Serializable {

    /**
     * Initialize the cache.

Code Block

language	java
title	LookupCache

/**
 * A semi-persistent mapping from keys to values for storing entries of lookup table.
 *
 * <p>The type of the caching key is a {@link RowData} with lookup key fields packed inside. The
 * type of value is a {@link Collection} of {@link RowData}, which are rows matching lookup key
 * fields.
 *
 * <p>Cache entries are manually added using {@link #put}, and are stored in the cache until either
 * evicted or manually invalidated.
 *
 * <p>Implementations of this interface are expected to be thread-safe, and can be safely accessed
 * by multiple concurrent threads.
 */
@PublicEvolving
public interface LookupCache {

    /**
     * Returns the value associated with key in this cache, or null if there is no cached value for
     * key.
     */
    @Nullable
    Collection<RowData> getIfPresent(RowData key);

    /**
     * Associates the specified value rows with the specified key row in the cache. If the cache
     * previously contained value associated with the key, the old value is replaced by the
     * specified value.
      *
     * @return@param metricGroup the previousmetric valuegroup rowsto associatedregister withcache key, or null if there was no mapping for key.related metrics.
     */
    void open(CacheMetricGroup metricGroup);

    /**
 @param key - key row* with whichReturns the specifiedvalue valueassociated iswith tokey bein associated
this cache, or null if *there @paramis valueno –cached value rowsfor
 to be associated with the specified* key.
     */
    @Nullable
    Collection<RowData> putgetIfPresent(RowData key, Collection<RowData> value);

    /**
     * Copies allAssociates the mappings fromspecified value rows with the specified mapkey row toin the cache. TheIf effect of this call isthe cache
     * equivalentpreviously tocontained thatvalue ofassociated callingwith {@code put(kthe key, v)}the onold thisvalue mapis oncereplaced for each mapping from keyby the
     * {@code k} to value {@code v} in the specified map. The behavior of this operation is
     * undefined if the specified map is modified while the operation is in progress.
     */
    void putAll(Map<? extends RowData, ? extends Collection<RowData>> m);

    /** Discards any cached value for the specified key.specified value.
      *
     * @return the previous value rows associated with key, or null if there was no mapping for key.
     * @param key - key row with which the specified value is to be associated
     * @param value – value rows to be associated with the specified key
     */
    voidCollection<RowData> invalidateput(RowData key, Collection<RowData> value);

    /** Discards allany cached entriesvalue infor the cachespecified key. */
    void invalidateAllinvalidate(RowData key);

    /** Returns the number of key-value mappings in the cache. */
    long size();
}

...

DefaultLookupCache

As the cache should be instantiated during runtime execution to avoid serialization / deserialization, a factory is required for creating the cache.

Code Block

language	java
title	LookupCacheFactoryDefaultLookupCache


/** Factory for creating an instanceDefault implementation of {@link LookupCache}. */
@PublicEvolving
public interfaceclass LookupCacheFactoryDefaultLookupCache extendsimplements SerializableLookupCache {

    private  /**final Duration expireAfterAccessDuration;
     * Create a {@link LookupCache}.
     *private final Duration expireAfterWriteDuration;
    private final Long maximumSize;
    private final boolean cacheMissingKey;
    
 *   @paramprivate metricGroupDefaultLookupCache(
 - The lookup cache metric group in which the cache register predefinedDuration andexpireAfterAccessDuration,
       *     customDuration metrics.expireAfterWriteDuration,
       */
    LookupCache createCache(LookupCacheMetricGroup metricGroup);
}

DefaultLookupCacheFactory

In order to simplify the usage of developer, we provide a default factory for building a default cache.

Code Block

language	java
title	DefaultLookupCacheFactory

/** Factory for creating instance of {@link DefaultLookupCache}. */
@PublicEvolving
public class DefaultLookupCacheFactory implements LookupCacheFactory {
    private final Duration expireAfterAccessDurationLong maximumSize,
			boolean cacheMissingKey) {
        this.expireAfterAccessDuration = expireAfterAccessDuration;
        this.expireAfterWriteDuration = expireAfterWriteDuration;
    private final Duration expireAfterWriteDuration    this.initialCapacity = initialCapacity;
    private final Integer initialCapacity;
    private final Long maximumSize;

    public static DefaultLookupCacheFactory.    this.maximumSize = maximumSize;
		this.cacheMissingKey = cacheMissingKey;
    }
    
    public static Builder newBuilder() {
        return new Builder();
    } 

   public  private DefaultLookupCacheFactory(
   static class Builder {         
        private Duration expireAfterAccessDuration,;
        private    Duration expireAfterWriteDuration,;
        private Long maximumSize;
   Integer initialCapacity,
    private Boolean cacheMissingKey;

      Long maximumSize) {
public Builder expireAfterAccess(Duration duration) {
    // Validation
        this.expireAfterAccessDuration = expireAfterAccessDurationduration;
        this.expireAfterWriteDuration = expireAfterWriteDuration;
      return  this.initialCapacity = initialCapacity;
        this.maximumSize = maximumSize;
    }

    @Override
    public LookupCacheBuilder createCacheexpireAfterWrite(LookupCacheMetricGroupDuration metricGroupduration) {
        // Create instance of DefaultLookupCache
expireAfterWriteDuration =   }

duration;
        /** Builder of {@link DefaultLookupCacheFactory}. */return this;
    public static class Builder {}

        private Duration expireAfterAccessDuration;
public Builder maximumSize(long maximumSize) {
          private Duration expireAfterWriteDuration;
  this.maximumSize = maximumSize;
           private Integerreturn initialCapacitythis;
        private Long maximumSize;}

        public DefaultLookupCacheFactory.Builder expireAfterAccesscacheMissingKey(Durationboolean durationcacheMissingKey) {
            expireAfterAccessDurationthis.cacheMissingKey = durationcacheMissingKey;
            return this;
        }          

        public DefaultLookupCacheFactory.BuilderDefaultLookupCache expireAfterWritebuild(Duration duration) {
            expireAfterWriteDurationreturn =new duration;DefaultLookupCache(
            return this;
       expireAfterAccessDuration,
 }

        public DefaultLookupCacheFactory.Builder initialCapacity(int initialCapacity) {
       expireAfterWriteDuration,
     this.initialCapacity = initialCapacity;
            return this maximumSize,
					cacheMissingKey);
        }

    }     public DefaultLookupCacheFactory.Builder maximumSize(long maximumSize) {
            this.maximumSize = maximumSize;
            return this;
        }

        public DefaultLookupCacheFactory build() {
            return new DefaultLookupCacheFactory(
                    expireAfterAccessDuration,
                    expireAfterWriteDuration,
                    initialCapacity,
                    maximumSize);
        }
    }
}

LookupCacheMetricGroup

An interface defining all cache related metric:

CacheMetricGroup

An interface defining all cache related metric:

Code Block

language	java
title	CacheMetricGroup

/**
 * Pre-defined metrics for cache.
 *
 * <p>Please note that these methods should only be invoked once. Registering a metric with same
 * name for multiple times would lead to an undefined behavior.
 */
@PublicEvolving
public interface CacheMetricGroup extends MetricGroup {
    /** The number of cache hits. */
    void hitCounter(Counter hitCounter);

    /** The number of cache misses. */
    void missCounter(Counter missCounter);

    /** The number of times to load data into cache from external system. */
    void loadCounter(Counter loadCounter);

    /** The number of load failures. */
    void numLoadFailuresCounter(Counter numLoadFailuresCounter);

    /** The time spent for the latest load operation. */
    void latestLoadTimeGauge(Gauge<Long> latestLoadTimeGauge);

    /** The number of records in cache. */
    void numCachedRecordsGauge(Gauge<Long> numCachedRecordsGauge);

Code Block

language	java
title	CacheMetricGroup

/** Pre-defined metrics for {@code LookupCache}. */
@PublicEvolving
public interface LookupCacheMetricGroup extends MetricGroup {
    /** The number of cache hitsbytes used by cache. */
    void setHitCounternumCachedBytesGauge(CounterGauge<Long> hitCounternumCachedBytesGauge);
}

PartialCachingLookupProvider

This is the API between table framework and user's table source. Implementation should define how to create a lookup function and whether to use cache.

Code Block

language	java
title	PartialCachingLookupProvider

/**
 * Provider  /** The number of cache misses. */
    void setMissCounter(Counter missCounter);for creating {@link LookupFunction} and {@link LookupCache} for storing lookup entries.
 */
@PublicEvolving
public interface PartialCachingLookupProvider extends LookupFunctionProvider {

    /**
 The number of times to* loadBuild dataa into{@link cachePartialCachingLookupProvider} from external system. */
the specified {@link LookupFunction} and
     void setLoadCounter(Counter loadCounter);

* {@link LookupCache}.
     /** The number of load failures. */*/
    static PartialCachingLookupProvider of(LookupFunction lookupFunction, LookupCache cache) {
    void setNumLoadFailuresCounter(Counter numLoadFailuresCounter);    return new PartialCachingLookupProvider() {

    /** The time spent for the latest load operation. */@Override
    void setLatestLoadTimeGauge(Gauge<Long> latestLoadTimeGauge);

    /** The number of recordspublic in cache. */LookupCache getCache() {
    void setNumCachedRecordsGauge(Gauge<Long> numCachedRecordsGauge);

    /** The number of bytes used byreturn cache. */;
    void setNumCachedBytesGauge(Gauge<Long> numCachedBytesGauge);
}

LookupFunctionProvider

This is the API between table framework and user's table source. Implementation should define how to create a lookup function and whether to use cache.

Code Block

language	java
title	LookupFunctionProvider

/**
 * Provider for creating {@link LookupFunction}

 and {@link LookupCacheFactory} if caching should be
 * enabled for the lookup table.@Override
 */
@PublicEvolving
public interface LookupFunctionProvider extends LookupTableSource.LookupRuntimeProvider {

    /**
  public LookupFunction createLookupFunction() *{
 Creates {@link LookupFunctionProvider} with the given {@link LookupFunction} and disable
     * lookup tablereturn caching.lookupFunction;
     */
    static LookupFunctionProvider of(LookupFunction lookupFunction) {
 }
        };
 return new LookupFunctionProvider() {}

    /** Get a new instance of {@link LookupCache}. @Override*/
    LookupCache getCache();
}

PartialCachingAsyncLookupProvider

Code Block

language	java
title	PartialCachingAsyncLookupProvider

/**
 * Provider for creating {@link  public LookupFunction createLookupFunction()AsyncLookupFunction} and {
@link LookupCache} for storing lookup entries.
 */
@PublicEvolving
public interface PartialCachingAsyncLookupProvider extends AsyncLookupFunctionProvider {

     return lookupFunction;/**
     * Build a {@link PartialCachingLookupProvider} from the }

specified {@link AsyncLookupFunction} and
     *    @Override{@link LookupCache}.
     */
    static PartialCachingLookupProvider of(AsyncLookupFunction publicasyncLookupFunction, Optional<LookupCacheFactory>LookupCache getCacheFactory(cache) {
        return        return Optional.emptynew PartialCachingAsyncLookupProvider();
            } {

            @Override
            public Optional<Boolean>LookupCache cacheMissingKeygetCache() {
                return Optional.empty()cache;
            }

        };
    }@Override

    /**
     * Creates {@link LookupFunctionProvider}public withAsyncLookupFunction thecreateAsyncLookupFunction() given {@link
 LookupFunction} and enable
     * caching with specified {@link LookupCacheFactory}.
   return asyncLookupFunction;
 */
    static LookupFunctionProvider of(
     }
       LookupFunction lookupFunction,};
    }

    /** Get a new LookupCacheFactoryinstance cacheFactory,
of {@link LookupCache}. */
    LookupCache getCache();
}

FullCachingLookupProvider

This interface is for supporting full cache strategy. It reuses ScanRuntimeProvider and defines reload time.

Code Block

language	java
title	FullCachingLookupProvider

/**
 * A {@link CachingLookupProvider} that never lookup in external system on cache miss and provides a
 * cache for holding all entries in the external system. The cache will be fully reloaded from the
 * external system by the {@link ScanTableSource.ScanRuntimeProvider} and reload operations will be
 * triggered by the {@link CacheReloadTrigger}.
 */
@PublicEvolving
public interface FullCachingLookupProvider extends LookupFunctionProvider {
    static FullCachingLookupProvider of(     boolean cacheMissingKey) {
        return new LookupFunctionProvider() {
            @Override
            public LookupFunction createLookupFunction() {
                return lookupFunction;
            }

            @OverrideScanTableSource.ScanRuntimeProvider scanRuntimeProvider,
            public Optional<LookupCacheFactory> getCacheFactory(CacheReloadTrigger cacheReloadTrigger) {
        return        return Optional.of(cacheFactory);
            }

new FullCachingLookupProvider() {
            @Override
            public Optional<Boolean>ScanTableSource.ScanRuntimeProvider cacheMissingKeygetScanRuntimeProvider() {
                return Optional.of(cacheMissingKey)scanRuntimeProvider;
            }

        };
    }@Override

    /** Creates an {@link LookupFunction} instance. */
  public CacheReloadTrigger LookupFunction createLookupFunctiongetCacheReloadTrigger(); {

    /**
     * Gets the {@link LookupCacheFactory} for creating lookupreturn cache.cacheReloadTrigger;
     *
     * <p>This factory will be used for creating an instance of cache during runtime execution for }

            @Override
     * optimizing the access to external lookup table.
public LookupFunction createLookupFunction() {
  *
     * @return an {@link Optional} of {@link LookupCacheFactory}, or anreturn emptykeyRow {@link Optional} if-> null;
     *     caching shouldn't be}
 applies to the lookup table.
     */};
    Optional<LookupCacheFactory> getCacheFactory();}

    /**
     * Get Whethera the missing key (key fields without any matching value rows) should be stored in the
     *{@link ScanTableSource.ScanRuntimeProvider} for scanning all entries from the external
     * lookup table and load into the cache.
     */
     * <p>Please note that this option is required if {@link #getCacheFactory()} returns a non-emptyScanTableSource.ScanRuntimeProvider getScanRuntimeProvider();

    /** Get a {@link CacheReloadTrigger} for triggering the reload operation. */
    CacheReloadTrigger getCacheReloadTrigger();
}

CacheReloadTrigger

A trigger defining custom logic for triggering full cache reloading.

Code Block

language	java
title	CacheReloadTrigger

/** instance.Customized Iftrigger thefor cachereloading factoryall islookup empty,table theentries returnin valuefull of this function will be ignored.caching mode. */
@PublicEvolving
public interface CachingReloadTrigger extends AutoCloseable, Serializable {

     /**
 Open the trigger.  */
 @return true if avoid null or empty value should be stored in the cache.
     */
    Optional<Boolean> cacheMissingKey();
}

AsyncLookupFunctionProvider

Code Block

language	java
title	AsyncLookupFunctionProvider

@PublicEvolving
public interface AsyncLookupFunctionProvider extends LookupTableSource.LookupRuntimeProvider {

    /**open(Context context) throws Exception;

    /**
     * Context of {@link CacheReloadTrigger} for getting information about times and
     * triggering reload.
     */
 Creates {@link AsyncLookupFunctionProvider} withinterface theContext given{

 {@link AsyncLookupFunction} and
     /** disableGet lookupcurrent tableprocessing cachingtime.
     */
    static AsyncLookupFunctionProvider of(AsyncLookupFunction asyncLookupFunction) {
        return new AsyncLookupFunctionProvider() {long currentProcessingTime();

        /** Get current watermark @Override
on the main stream. */
        publiclong AsyncLookupFunction createAsyncLookupFunctioncurrentWatermark() {;

        /** Trigger a reload operation on the full return asyncLookupFunction;
cache. */
        CompletableFuture<Void> triggerReload();
    }

            @Override
            public Optional<LookupCacheFactory> getCacheFactory() {
                return Optional.empty();
            }

            @Override
        }

PeriodicCacheReloadTrigger

An implementation of FullCachingReloadTrigger that triggers reload with a specified interval.

Code Block

language	java
title	PeriodicCacheReloadTrigger

/** A trigger that reloads all entries periodically with specified interval or delay. */
public class PeriodicCacheReloadTrigger implements CacheReloadTrigger {

    private final Duration reloadInterval;
    private final ScheduleMode scheduleMode;

    public PeriodicCacheReloadTrigger(Duration reloadInterval, ScheduleMode scheduleMode) {
        this.reloadInterval = reloadInterval;
        this.scheduleMode = scheduleMode;
    }

    @Override
    public Optional<Boolean>void cacheMissingKeyopen(CacheReloadTrigger.Context context) {
        // Register periodic reload task
    return Optional.empty();
}

    @Override
    public void close() throws Exception }{
         };// Dispose resources
    }

    /**public enum ScheduleMode {
     * Creates {@link AsyncLookupFunctionProvider} with the given {@link AsyncLookupFunction} and   FIXED_DELAY,
        FIXED_RATE
     }
}

TimedCacheReloadTrigger

Code Block

language	java
title	TimedCacheReloadTrigger

/** enableA cachingtrigger withthat specifiedreloads {@link LookupCacheFactory}.
     */
    static AsyncLookupFunctionProvider of(
       at a specific local time and repeat for the given interval in days. */ 
public class TimedCacheReloadTrigger implements CacheReloadTrigger {

    private AsyncLookupFunction asyncLookupFunction,final LocalTime reloadTime;
    private final int reloadIntervalInDays;

    public LookupCacheFactoryTimedCacheReloadTrigger(LocalTime cacheBuilderreloadTime,
            boolean cacheMissingKeyint reloadIntervalInDays) {
        returnthis.reloadTime new AsyncLookupFunctionProvider() {= reloadTime;
        this.reloadIntervalInDays =   @OverridereloadIntervalInDays;
    }

    @Override
    public AsyncLookupFunctionvoid createAsyncLookupFunctionopen(Context context) {
		// Register periodic              return asyncLookupFunction;
       reload task
     }

            @Override
            public Optional<LookupCacheFactory>void getCacheFactoryclose() throws Exception {
		// Dispose resources
              return Optional.of(cacheBuilder);
            }

            @Override
            public Optional<Boolean> cacheMissingKey() {
                return Optional.of(cacheMissingKey);
            }
        };
    }

    /** Creates an {@link AsyncLookupFunction} instance. */
    AsyncLookupFunction createAsyncLookupFunction();

    /**
     * Gets the {@link LookupCacheFactory} for creating lookup cache.
     *
     * <p>This factory will be used for creating an instance of cache during runtime execution for
     * optimizing the access to external lookup table.
     *
     * @return an {@link Optional} of {@link LookupCacheFactory}, or an empty {@link Optional} if
     *     caching shouldn't be applies to the lookup table.
     */
    Optional<LookupCacheFactory> getCacheFactory();

    /**
     * Whether the missing key (key fields without any matching value rows) should be stored in the
     * cache.
     *
     * <p>Please note that this option is required if {@link #getCacheFactory()} returns a non-empty
     * instance. If the cache factory is empty, the return value of this function will be ignored.
     *
     * @return true if a null or empty value should be stored in the cache.
     */
    Optional<Boolean> cacheMissingKey();
}

RescanRuntimeProvider

This interface is for supporting all cache strategy. It reuses ScanRuntimeProvider and defines interval of re-scan.

...

language	java
title	RescanRuntimeProvider

...

}
}

TableFunctionProvider / AsyncTableFunctionProvider

We'd like to deprecate these two interfaces and let developers to switch to the new LookupFunctionProvider / AsyncLookupFunctionProvider / FullCachingLookupProvider instead.

Table Options for Lookup Cache

In order to unify the usage of caching across all connectors, we'd like to introduce some common table options, which are defined under class LookupOptions. Note that these options are not required to implement by all connectors.

Option	Type	Descriptions
lookup.cache	Enum of NONE, PARTIAL and FULL	The caching strategy for this lookup table. NONE: Do not use cache Partial: Use partial caching mode FULL: Use full caching mode
lookup.max-retries	Integer	The maximum allowed retries if a lookup operation fails
lookup.partial-cache.expire-after-access	Duration	Duration to expire an entry in the cache after accessing
lookup.partial-cache.expire-after-write	Duration	Duration to expire an entry in the cache after writing
lookup.partial-cache.cache-missing-key	Boolean	Whether to store an empty value into the cache if the lookup key doesn't match any rows in the table
lookup.partial-cache.max-rows	Long	The maximum number of rows to store in the cache
lookup.full-cache.reload-strategy	Enum of PERIODIC and TIMED	The reload strategy for the full cache scenario. PERIODIC: Use PeriodicCacheReloadTrigger TIMED: Use TimedCacheReloadTrigger
lookup.full-cache.periodic-reload.interval	Duration	Duration to trigger reload in the PeriodicCacheReloadTrigger
lookup.full-cache.periodic-reload.schedule-mode	Enum of FIXED_DELAY and FIXED_RATE	The periodically schedule mode of reloading in the PeriodicCacheReloadTrigger
lookup.full-cache.timed-reload.iso-time	String	Time in ISO-8601 format when cache needs to be reloaded. Time can be specified either with timezone or without timezone (target JVM local timezone will be used). See formatter ISO_TIME.
lookup.full-cache.timed-reload.interval-in-days	Integer	The interval in days to trigger the reload at the specified time

Cache Metrics

It is important to mention that a cache implementation does not have to report all the defined metrics. But if a cache reports a metric of the same semantic defined below, the implementation should follow the convention.

Name	Type	Unit	Description
numCachedRecord	Gauge	Records	The number of records in cache.
numCachedBytes	Gauge	Bytes	The number of bytes used by cache.
hitCount	Counter		The number of cache hits
missCount	Counter		The number of cache misses, which might leads to loading operations
loadCount	Counter		The number of times to load data into cache from external system. For

LRU

partial cache the load count should be equal to miss count, but for all cache this would be different.
numLoadFailure	Counter		The number of load failures
latestLoadTime	Gauge	ms	The time spent for the latest load operation

Here we just define fundamental metrics and let the external metric system make the aggregation to get more descriptive values such as hitRate = hitCount / (hitCount + missCount).

Scope

The metric group for the cache would be a sub-group of the OperatorMetricGroup where the table function belongs to.-group of the OperatorMetricGroup where the table function belongs to.

Future Works

In order to reduce network I/O with external systems and the usage of cache further, some optimizations implemented on scan source could be also applied on the lookup table, such as projection and filter pushdown. These features will be introduced separately in another FLIP.

Compatibility, Deprecation, and Migration Plan

Currently we have JDBC, Hive and HBase connector implemented lookup table source. All existing implementations will be migrated to the current design and the migration will be transparent to end users. Table options related to caching defined by these connectors will be migrated to new table options defined in this FLIP above.

Test Plan

We will use unit and integration test for validating the functionality of cache implementations.

Rejected Alternatives

Add cache in TableFunction implementations

Compared with this design, adding cache in TableFunction implementations might lead to inconsistency between sync and async table function, and not suitable for applying optimizations.

Page tree

Page History

Versions Compared

Old Version 11

New Version Current

Key

Status

Motivation

Proposed Changes

Top-level APIs

LRU cache

Partial and Full Caching

Partial caching

Full Caching

All Cache

Public Interfaces

Lookup Functions

LookupCache

LookupFunctionProvider

AsyncLookupFunctionProvider

LookupCache

DefaultLookupCache

DefaultLookupCacheFactory

LookupCacheMetricGroup

CacheMetricGroup

PartialCachingLookupProvider

LookupFunctionProvider

PartialCachingAsyncLookupProvider

FullCachingLookupProvider

CacheReloadTrigger

AsyncLookupFunctionProvider

PeriodicCacheReloadTrigger

TimedCacheReloadTrigger

RescanRuntimeProvider

TableFunctionProvider / AsyncTableFunctionProvider

Table Options for Lookup Cache

Cache Metrics

Scope

Future Works

Compatibility, Deprecation, and Migration Plan

Test Plan

Rejected Alternatives

Add cache in TableFunction implementations