Status

Current state: Under discussion

Discussion thread: https://lists.apache.org/thread/1vokqdnnt01yycl7y1p74g556cc8yvtq

JIRA: TBD

Released: Not released yet

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

This FLIP is a joint work of Yuan Zhu (zstraw@163.com) and Qingsheng Ren .

Motivation

As a widely-used feature in Flink SQL jobs, the performance of lookup table source is essential not only for users but also source developers for tuning their implementations. Most lookup table sources use cache to achieve better performance, but there are some features missing in the current design of cache:

Missing cache related metrics, which is the key to debug and optimize SQL tasks
Duplicated implementations. Currently every lookup source needs to implement or use its own cache.
Inconsistent. Table options related to caching are defined differently in sources

In order to address the issues above, we propose here to define a unified abstraction for lookup source cache and its related metrics.

Proposed Changes

We'd like to split the proposal into two kinds of caching mode: partial caching and full caching.

Partial caching

Partial caching loads data into the cache along with the access to the external system. If the key to lookup does not exist in the cache, a lookup action to the external system will be triggered and the lookup result will be stored in the cache for further loading. Users and lookup table developers are able to config the eviction policy and maximum size of the cache.

In order to support partial caching, we propose to introduce several new interfaces to simplify the work for developers to implement lookup table functions and enable cache:

LookupFunction / AsyncLookupFunction, an extended version of TableFunction to clarify the semantic of lookup.
LookupCache, defining the cache used in lookup table.
DefaultLookupCache a default implementation of a cache that suitable for most use cases.
CacheMetricGroup, defining metrics should be reported by the cache.
PartialCachingLookupProvider / AsyncPartialCachingLookupProvider, as the API interacting with table source to get LookupFunction and LookupCacheFactory.

The cache serves as a component in LookupJoinRunner, and would be pluggable by specifying LookupCacheFactory in LookupFunctionProvider. The developer of a lookup table needs to define a LookupFunctionProvider / AsyncLookupProvider in their implementation of LookupTableSource to specify the LookupFunction and the factory of the cache, then the planner will take over the cache factory, pass it to the LookupJoinRunner, and the cache will be instantiated during the runtime execution.

Full Caching

If the size of lookup table is relatively small to fit into the memory, and the lookup table doesn't change frequently, it'll be more efficient to load all entries of the lookup table into the cache to reduce network I/O, and refresh the table periodically. We'd like to name this use case as "full cache". Logically the reload operation is a kind of scan, so we'd like to reuse the ScanRuntimeProvider so that developers could reuse the scanning logic implemented in Source / SourceFunction / InputFormat. Considering the complexity of Source API, we'd like to support SourceFunction and InputFormat API first. Supporting Source API might require new topology and will be discussed later in another FLIP.

We propose to introduce several new interfaces:

FullCachingLookupProvider, for reusing the ability of scanning.
FullCachingReloadTrigger, for customizing reloading strategies of all entries in the full cache.

Public Interfaces

Lookup Functions

As the usage of TableFunction interface is not quite straight forward to lookup table developers, we'd like to introduce a new interface for sync and async lookup tables. Caching will be only supported on LookupFunction / AsyncLookupFunction.

LookupFunction

/**
 * A wrapper class of {@link TableFunction} for synchronously lookup rows matching the lookup keys
 * from external system.
 *
 * <p>The output type of this table function is fixed as {@link RowData}.
 */
@PublicEvolving
public abstract class LookupFunction extends TableFunction<RowData> {

    /**
     * Synchronously lookup rows matching the lookup keys.
     *
     * @param keyRow - A {@link RowData} that wraps keys to lookup.
     * @return A collections of all matching rows in the lookup table.
     */
    public abstract Collection<RowData> lookup(RowData keyRow) throws IOException;

    /** Invoke {@link #lookup} and handle exceptions. */
    public final void eval(Object... keys) {
        try {
            lookup(GenericRowData.of(keys)).forEach(this::collect);
        } catch (IOException e) {
            throw new RuntimeException("Failed to lookup values with given key", e);
        }
    }
}

AsyncLookupFunction

/**
 * A wrapper class of {@link AsyncTableFunction} for asynchronously lookup rows matching the lookup
 * keys from external system.
 *
 * <p>The output type of this table function is fixed as {@link RowData}.
 */
@PublicEvolving
public abstract class AsyncLookupFunction extends AsyncTableFunction<RowData> {

    /**
     * Asynchronously lookup rows matching the lookup keys.
     *
     * @param keyRow - A {@link RowData} that wraps keys to lookup.
     * @return A collections of all matching rows in the lookup table.
     */
    public abstract CompletableFuture<Collection<RowData>> asyncLookup(RowData keyRow);

    /** Invokes {@link #asyncLookup} and chains futures. */
    public final void eval(CompletableFuture<Collection<RowData>> future, Object... keys) {
         asyncLookup(GenericRowData.of(keys))
                .whenCompleteAsync(
                        (result, exception) -> {
                            if (exception != null) {
                                future.completeExceptionally(exception);
                                return;
                            }
                            future.complete(result);
                        });
    }
}

LookupCache

Considering there might be custom caching strategies and optimizations, we'd like to expose the cache interface as public API for developers to make the cache pluggable.

PartialCache

/**
 * A semi-persistent mapping from keys to values for storing entries of lookup table.
 *
 * <p>The type of the caching key is a {@link RowData} with lookup key fields packed inside. The
 * type of value is a {@link Collection} of {@link RowData}, which are rows matching lookup key
 * fields.
 *
 * <p>Cache entries are manually added using {@link #put}, and are stored in the cache until either
 * evicted or manually invalidated.
 *
 * <p>Implementations of this interface are expected to be thread-safe, and can be safely accessed
 * by multiple concurrent threads.
 */
@PublicEvolving
public interface LookupCache extends AutoClosable {

    /**
     * Initialize the cache.
     *
     * @param metricGroup the metric group to register cache related metrics.
     */
    void open(CacheMetricGroup metricGroup);

    /**
     * Returns the value associated with key in this cache, or null if there is no cached value for
     * key.
     */
    @Nullable
    Collection<RowData> getIfPresent(RowData key);

    /**
     * Associates the specified value rows with the specified key row in the cache. If the cache
     * previously contained value associated with the key, the old value is replaced by the
     * specified value.
      *
     * @return the previous value rows associated with key, or null if there was no mapping for key.
     * @param key - key row with which the specified value is to be associated
     * @param value – value rows to be associated with the specified key
     */
    Collection<RowData> put(RowData key, Collection<RowData> value);

    /** Discards any cached value for the specified key. */
    void invalidate(RowData key);

    /** Returns the number of key-value mappings in the cache. */
    long size();
}

DefaultLookupCache

As the cache should be instantiated during runtime execution to avoid serialization / deserialization, a factory is required for creating the cache.

DefaultPartialCache

/** Default implementation of {@link LookupCache}. */
@PublicEvolving
public class DefaultLookupCache implements LookupCache {
    private final Duration expireAfterAccessDuration;
    private final Duration expireAfterWriteDuration;
    private final Long maximumSize;
    private final boolean cacheMissingKey;
    
    private DefaultLookupCache(
            Duration expireAfterAccessDuration,
            Duration expireAfterWriteDuration,
            Long maximumSize,
			boolean CacheMissingKey) {
        this.expireAfterAccessDuration = expireAfterAccessDuration;
        this.expireAfterWriteDuration = expireAfterWriteDuration;
        this.initialCapacity = initialCapacity;
        this.maximumSize = maximumSize;
		this.cacheMissingKey = cacheMissingKeyl
    }
    
    public static Builder newBuilder() {
        return new Builder();
    } 

   public static class Builder {         
        private Duration expireAfterAccessDuration;
        private Duration expireAfterWriteDuration;
        private Long maximumSize;
        private Boolean cacheMissingKey;

        public Builder expireAfterAccess(Duration duration) {
            expireAfterAccessDuration = duration;
            return this;
        }

        public Builder expireAfterWrite(Duration duration) {
            expireAfterWriteDuration = duration;
            return this;
        }

        public Builder maximumSize(long maximumSize) {
            this.maximumSize = maximumSize;
            return this;
        }

        public Builder cacheMissingKey(boolean cacheMissingKey) {
            this.cacheMissingKey = cacheMissingKey;
            return this;
        }          

        public DefaultLookupCache build() {
            return new DefaultLookupCache(
                    expireAfterAccessDuration,
                    expireAfterWriteDuration,
                    maximumSize,
					cacheMissingKey);
        }
    }     
}

CacheMetricGroup

An interface defining all cache related metric:

LookupCacheMetricGroup

/**
 * Pre-defined metrics for cache.
 *
 * <p>Please note that these methods should only be invoked once. Registering a metric with same
 * name for multiple times would lead to an undefined behavior.
 */
@PublicEvolving
public interface CacheMetricGroup extends MetricGroup {
    /** The number of cache hits. */
    void hitCounter(Counter hitCounter);

    /** The number of cache misses. */
    void missCounter(Counter missCounter);

    /** The number of times to load data into cache from external system. */
    void loadCounter(Counter loadCounter);

    /** The number of load failures. */
    void numLoadFailuresCounter(Counter numLoadFailuresCounter);

    /** The time spent for the latest load operation. */
    void latestLoadTimeGauge(Gauge<Long> latestLoadTimeGauge);

    /** The number of records in cache. */
    void numCachedRecordsGauge(Gauge<Long> numCachedRecordsGauge);

    /** The number of bytes used by cache. */
    void numCachedBytesGauge(Gauge<Long> numCachedBytesGauge);
}

PartialCachingLookupProvider

This is the API between table framework and user's table source. Implementation should define how to create a lookup function and whether to use cache.

LookupFunctionProvider

//**
 * Provider for creating {@link LookupFunction} and {@link LookupCache} if caching should be enabled
 * for the lookup table.
 */
@PublicEvolving
public interface PartialCachingLookupProvider extends LookupTableSource.LookupRuntimeProvider {

    /** Creates a builder of {@link PartialCachingLookupProvider}. */
    static Builder newBuilder() {
        return new Builder();
    }

    /** Creates an {@link LookupFunction} instance. */
    LookupFunction createLookupFunction();

    /**
     * Gets the instance of {@link LookupCache}.
     *
     * <p>This cache will be initialized by {@link LookupCache#open} during runtime execution and
     * used for optimizing the access to external lookup table.
     *
     * @return an {@link Optional} of {@link LookupCache}, or an empty {@link Optional} if caching
     *     shouldn't be applies to the lookup table.
     */
    Optional<LookupCache> getCache();

    /** Builder class for {@link PartialCachingLookupProvider}. */
    class Builder {

        private LookupFunction lookupFunction;
        private LookupCache cache;
        private Boolean cacheMissingKey;

        /** Sets lookup function. */
        public Builder withLookupFunction(LookupFunction lookupFunction) {
            this.lookupFunction = lookupFunction;
            return this;
        }

        /** Enables caching and sets the cache factory. */
        public Builder withCache(LookupCache cache, boolean cacheMissingKey) {
            this.cache = cache;
            this.cacheMissingKey = cacheMissingKey;
            return this;
        }

        public PartialCachingLookupProvider build() {
           ...
        }
    }
}

AsyncPartialCachingLookupProvider

AsyncLookupFunctionProvider

/**
 * Provider for creating {@link AsyncLookupFunction} and {@link LookupCache} if caching should be
 * enabled for the lookup table.
 */
@PublicEvolving
public interface AsyncPartialCachingLookupProvider extends LookupTableSource.LookupRuntimeProvider {

    /** Creates a builder of {@link AsyncPartialCachingLookupProvider}. */
    static AsyncPartialCachingLookupProvider.Builder newBuilder() {
        return new AsyncPartialCachingLookupProvider.Builder();
    }

    /** Creates an {@link AsyncLookupFunction} instance. */
    AsyncLookupFunction createAsyncLookupFunction();

    /**
     * Gets the {@link LookupCache} for creating lookup cache.
     *
     * <p>This factory will be used for creating an instance of cache during runtime execution for
     * optimizing the access to external lookup table.
     *
     * @return an {@link Optional} of {@link LookupCache}, or an empty {@link Optional} if caching
     *     shouldn't be applies to the lookup table.
     */
    Optional<LookupCache> getCache();

    /** Builder class for {@link AsyncPartialCachingLookupProvider}. */
    class Builder {

        private AsyncLookupFunction asyncLookupFunction;
        private LookupCache cache;
        private Boolean cacheMissingKey;

        /** Sets lookup function. */
        public AsyncPartialCachingLookupProvider.Builder withLookupFunction(
                AsyncLookupFunction asyncLookupFunction) {
            this.asyncLookupFunction = asyncLookupFunction;
            return this;
        }

        /** Enables caching and sets the cache factory. */
        public AsyncPartialCachingLookupProvider.Builder withCache(
                LookupCache cache, boolean cacheMissingKey) {
            this.cache = cache;
            this.cacheMissingKey = cacheMissingKey;
            return this;
        }

        public AsyncPartialCachingLookupProvider build() {
           ...
        }
    }
}

FullCachingLookupProvider

This interface is for supporting full cache strategy. It reuses ScanRuntimeProvider and defines reload time.

FullCachingLookupProvider

/**
 * Runtime provider for fully loading and periodically reloading all entries of the lookup table and
 * storing the table locally for lookup.
 *
 * <p>Implementations should provide a {@link ScanTableSource.ScanRuntimeProvider} in order to reuse
 * the ability of scanning for loading all entries from the lookup table.
 */
@PublicEvolving
public interface FullCachingLookupProvider extends LookupTableSource.LookupRuntimeProvider {

    static FullCachingLookupProvider of(
            ScanTableSource.ScanRuntimeProvider scanRuntimeProvider,
            FullCachingReloadTrigger fullCachingReloadTrigger) {
        return new FullCachingLookupProvider() {
            @Override
            public ScanTableSource.ScanRuntimeProvider getScanRuntimeProvider() {
                return scanRuntimeProvider;
            }

            @Override
            public FullCachingReloadTrigger getReloadTrigger() {
                return fullCachingReloadTrigger;
            }
        };
    }

    /**
     * Gets the {@link ScanTableSource.ScanRuntimeProvider} for executing the periodically reload.
     */
    ScanTableSource.ScanRuntimeProvider getScanRuntimeProvider();

    /** Get the {@link FullCachingReloadTrigger} for triggering a full caching reload operation. */
    FullCachingReloadTrigger getReloadTrigger();
}

FullCachingReloadTrigger

A trigger defining custom logic for triggering full cache reloading.

FullCachingReloadTrigger

/** Customized trigger for reloading all lookup table entries in full caching mode. */
public interface FullCachingReloadTrigger extends AutoCloseable, Serializable {

    /** Open the trigger. */
    void open(Context context) throws Exception;

    /**
     * Context of {@link FullCachingReloadTrigger} for getting information about times and
     * triggering reload.
     */
    interface Context {

        /** Get current processing time. */
        long currentProcessingTime();

        /** Get current watermark on the main stream. */
        long currentWatermark();

        /** Trigger a reload operation on the full cache. */
        CompletableFuture<Void> triggerReload();
    }
}

PeriodicFullCachingReloadTrigger

An implementation of FullCachingReloadTrigger that triggers reload with a specified interval.

/** A trigger that reloads all entries periodically with specified interval or delay. */
public class PeriodicFullCachingReloadTrigger implements FullCachingReloadTrigger {

    private final Duration reloadInterval;
    private final ScheduleMode scheduleMode;

    private ScheduledExecutorService scheduledExecutor;

    public PeriodicFullCachingReloadTrigger(Duration reloadInterval, ScheduleMode scheduleMode) {
        this.reloadInterval = reloadInterval;
        this.scheduleMode = scheduleMode;
    }

    @Override
    public void open(FullCachingReloadTrigger.Context context) {
        scheduledExecutor = Executors.newSingleThreadScheduledExecutor();
        switch (scheduleMode) {
            case FIXED_RATE:
                scheduledExecutor.scheduleAtFixedRate(
                        context::triggerReload,
                        0,
                        reloadInterval.toMillis(),
                        TimeUnit.MILLISECONDS);
                break;
            case FIXED_DELAY:
                scheduledExecutor.scheduleWithFixedDelay(
                        () -> {
                            try {
                                context.triggerReload().get();
                            } catch (Exception e) {
                                throw new RuntimeException(
                                        "Uncaught exception during the reload", e);
                            }
                        },
                        0,
                        reloadInterval.toMillis(),
                        TimeUnit.MILLISECONDS);
                break;
            default:
                throw new IllegalArgumentException(
                        String.format("Unrecognized schedule mode \"%s\"", scheduleMode));
        }
    }

    @Override
    public void close() throws Exception {
        if (scheduledExecutor != null) {
            scheduledExecutor.shutdownNow();
        }
    }

    public enum ScheduleMode {
        FIXED_DELAY,
        FIXED_RATE
    }
}

TableFunctionProvider / AsyncTableFunctionProvider

We'd like to deprecate these two interfaces and let developers to switch to the new LookupFunctionProvider / AsyncLookupFunctionProvider / FullCachingLookupProvider instead.

Table Options for Lookup Cache

In order to unify the usage of caching across all connectors, we'd like to introduce some common table options, which are defined under class LookupOptions. Note that these options are not required to implement by all connectors.

Option	Type	Descriptions
lookup.cache	Enum of NONE, PARTIAL and FULL	The caching strategy for this lookup table. NONE: Do not use cache Partial: Use partial caching mode FULL: Use full caching mode
lookup.max-retries	Integer	The maximum allowed retries if a lookup operation fails
lookup.partial-cache.expire-after-access	Duration	Duration to expire an entry in the cache after accessing
lookup.partial-cache.expire-after-write	Duration	Duration to expire an entry in the cache after writing
lookup.partial-cache.cache-missing-key	Boolean	Whether to store an empty value into the cache if the lookup key doesn't match any rows in the table
lookup.partial-cache.max-rows	Long	The maximum number of rows to store in the cache
lookup.full-cache.reload-interval	Duration	Interval of reloading all entries from the lookup table into cache
lookup.full-cache.reload-schedule-mode	Enum of FIXED_DELAY and FIXED_RATE	The periodically schedule mode of reloading.

Cache Metrics

It is important to mention that a cache implementation does not have to report all the defined metrics. But if a cache reports a metric of the same semantic defined below, the implementation should follow the convention.

Name	Type	Unit	Description
numCachedRecord	Gauge	Records	The number of records in cache.
numCachedBytes	Gauge	Bytes	The number of bytes used by cache.
hitCount	Counter		The number of cache hits
missCount	Counter		The number of cache misses, which might leads to loading operations
loadCount	Counter		The number of times to load data into cache from external system. For partial cache the load count should be equal to miss count, but for all cache this would be different.
numLoadFailure	Counter		The number of load failures
latestLoadTime	Gauge	ms	The time spent for the latest load operation

Here we just define fundamental metrics and let the external metric system make the aggregation to get more descriptive values such as hitRate = hitCount / (hitCount + missCount).

Scope

The metric group for the cache would be a sub-group of the OperatorMetricGroup where the table function belongs to.

Future Works

In order to reduce network I/O with external systems and the usage of cache further, some optimizations implemented on scan source could be also applied on the lookup table, such as projection and filter pushdown. These features will be introduced separately in another FLIP.

Compatibility, Deprecation, and Migration Plan

Currently we have JDBC, Hive and HBase connector implemented lookup table source. All existing implementations will be migrated to the current design and the migration will be transparent to end users. Table options related to caching defined by these connectors will be migrated to new table options defined in this FLIP above.

Test Plan

We will use unit and integration test for validating the functionality of cache implementations.

Rejected Alternatives

Add cache in TableFunction implementations

Compared with this design, adding cache in TableFunction implementations might lead to inconsistency between sync and async table function, and not suitable for applying optimizations.

Page tree

FLIP-221: Abstraction for lookup source cache and metric