You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »


JIRA:   Unable to render Jira issues macro, execution error.

Discussion: 

Motivation:

Whenever a call is made to get a particular key from a Kafka Streams instance, currently it returns a Queryable store that contains a list of the stores for all the running and restoring/replica(with KIP-535: Allow state stores to serve stale reads during rebalance) on the instance via StreamThreadStateStoreProvider#stores(). This list of stores is then provided to CompositeReadOnlyKeyValueStore#get() which looks into each store one by one. With the changes that went in as a part of KIP-535 since we have access to the information that a key belongs to which partition, we should have the capability to fetch store for that particular partition and look for the key in store for that partition only. It would be a good improvement for improving latencies for applications that contain multiple partitions on a single instance and don't have bloom filters enabled internally for RocksDB.


Public Interfaces:

Adding new Class StoreQueryParams.java to provide user options to the QueryableStoreProvider layer to understand what kind of stores a user wants. It would currently include whether a user is okay with serving stale data and if user already knows what is the partition of the store a user is looking at. Since store name and partition would be a unique combination, a taskId can be generated from this information to return the store for that particular task.

StoreQueryParams.java
package org.apache.kafka.streams;

/**
 * Represents all the query options that a user can provide to state what kind of stores it is expecting. The options would be whether a user would want to enable/disable stale stores* or whether it knows the specific partition that it specifically wants to fetch. If this information is not provided the default behavior is to fetch the stores for all the partitions available on that instance* for that particular store name.
 * It contains a partition, which for a point queries can be populated from the  KeyQueryMetadata.
 */
public class StoreQueryParams {

    private final Integer partition;
    private final boolean includeStaleStores;

    public static final StoreQueryParams withPartitionAndStaleStoresDisabled(final Integer partition) {
        return new StoreQueryParams(partition, false);
    }

    public static final StoreQueryParams withPartitionAndStaleStoresEnabled(final Integer partition) {
        return new StoreQueryParams(partition, true);
    }

    public static final StoreQueryParams withAllPartitionAndStaleStoresDisabled() {
        return new StoreQueryParams(null, false);
    }

    public static final StoreQueryParams withAllPartitionAndStaleStoresEnabled() {
        return new StoreQueryParams(null, true);
    }

    private StoreQueryParams(final Integer partition, final boolean includeStaleStores) {
        this.partition = partition;
        this.includeStaleStores = includeStaleStores;
    }


    /**
     * Get the partition to be used to fetch list of Queryable store from QueryableStoreProvider.
     *
     * @return an Integer partition
     */
    public Integer getPartition() {
        return partition;
    }

    /**
     * Get the flag includeStaleStores. If true, include standbys and recovering stores along with running stores
     *
     * @return boolean includeStaleStores
     */
    public boolean includeStaleStores() {
        return includeStaleStores;
    }

    /**
     * Get whether the store query params are fetching all partitions or a single partition.
     *
     * @return boolean. True, if all partitions are requests or false if a specific partition is requested
     */
    public boolean getAllLocalPartitions() {
        return partition == null ? true : false;
    }

    @Override
    public boolean equals(final Object obj) {
        if (!(obj instanceof StoreQueryParams)) {
            return false;
        }
        final StoreQueryParams storeQueryParams = (StoreQueryParams) obj;
        return Objects.equals(storeQueryParams.partition, partition)
                && Objects.equals(storeQueryParams.includeStaleStores, includeStaleStores);
    }


    @Override
    public String toString() {
        return "StoreQueryParams {" +
                "partition=" + partition +
                ", includeStaleStores=" + includeStaleStores +
                '}';
    }


    @Override
    public int hashCode() {
        return Objects.hash(partition, includeStaleStores);
    }
}
KafkaStreams.java
     /**
     * Get a facade wrapping the local {@link StateStore} instances with the provided {@code storeName} if the Store's
     * type is accepted by the provided {@link QueryableStoreType#accepts(StateStore) queryableStoreType}.
     * The returned object can be used to query the {@link StateStore} instances.
     *
     * Only permits queries on active replicas of the store (no standbys or restoring replicas).
     * See {@link KafkaStreams#store(String, QueryableStoreType, StoreQueryParams)}
     * for the option to set {@code StoreQueryParams.withAllPartitionAndStaleStoresEnabled or StoreQueryParams.withPartitionAndStaleStoresEnabled(final Integer partition)} and trade off consistency in favor of availability.
     *
     * @param storeName           name of the store to find
     * @param queryableStoreType  accept only stores that are accepted by {@link QueryableStoreType#accepts(StateStore)}
     * @param <T>                 return type
     * @return A facade wrapping the local {@link StateStore} instances
     * @throws InvalidStateStoreException if Kafka Streams is (re-)initializing or a store with {@code storeName} and
     * {@code queryableStoreType} doesn't exist
     */
    public <T> T store(final String storeName, final QueryableStoreType<T> queryableStoreType) {
        return store(storeName, queryableStoreType, StoreQueryParams.withAllPartitionAndStaleStoresDisabled());
    }

    /**
     * Get a facade wrapping the local {@link StateStore} instances with the provided {@code storeName} if the Store's
     * type is accepted by the provided {@link QueryableStoreType#accepts(StateStore) queryableStoreType}.
     * The returned object can be used to query the {@link StateStore} instances.
     *
     * @param storeName           name of the store to find
     * @param queryableStoreType  accept only stores that are accepted by {@link QueryableStoreType#accepts(StateStore)}
     * @param storeQueryParams    If StoreQueryParams.withAllPartitionAndStaleStoresDisabled() is used, it only permit queries on the active replicas for all the partitions
     *                            available on the local instance, and only if the
     *                            task for that partition is running. I.e., the state store is not a standby replica,
     *                            and it is not restoring from the changelog.
     *                            If StoreQueryParams.withPartitionAndStaleStoresEnabled(final Integer partition) is used, it only permit queries on the specific provided active replicas
     *                            for the partition provided in the parameter, and only if the
     *                            task for that partition is running. I.e., the state store is not a standby replica,
     *                            and it is not restoring from the changelog.
     *                            If StoreQueryParams.withAllPartitionAndStaleStoresEnabled(), allow queries on standbys and restoring replicas in addition to active ones.
     *                            If StoreQueryParams.withPartitionAndStaleStoresEnabled(final Integer partition), allow queries on the specific partition irrespective if it is a standby
     *                            or a restoring replicas in addition to active ones.
     * @param <T>                 return type
     * @return A facade wrapping the local {@link StateStore} instances
     * @throws InvalidStateStoreException if Kafka Streams is (re-)initializing or a store with {@code storeName} and
     * {@code queryableStoreType} doesn't exist
     */
    public <T> T store(final String storeName,
                       final QueryableStoreType<T> queryableStoreType,
                       final StoreQueryParams storeQueryParams) {
        validateIsRunningOrRebalancing();
        return queryableStoreProvider.getStore(storeName, queryableStoreType, storeQueryParams);
    }


QueryableStoreProvider.java
  /**
     * Get a composite object wrapping the instances of the {@link StateStore} with the provided
     * storeName and {@link QueryableStoreType}
     *
     * @param storeName          name of the store
     * @param queryableStoreType accept stores passing {@link QueryableStoreType#accepts(StateStore)}
     * @param storeQueryParams       if stateStoresEnabled is used i.e. includeStaleStores is true, include standbys and recovering stores;
     *                                        if stateStoresDisabled i.e. includeStaleStores is false, only include running actives;
											  if partition is null then it fetches all local partitions on the instance;
											  if partition is set then it fetches a specific partition.
     * @param <T>                The expected type of the returned store
     * @return A composite object that wraps the store instances.
     */
    public <T> T getStore(final String storeName,
                          final QueryableStoreType<T> queryableStoreType,
                          final StoreQueryParams storeQueryParams) {
        final List<T> globalStore = globalStoreProvider.stores(storeName, queryableStoreType);
        if (!globalStore.isEmpty()) {
            return queryableStoreType.create(globalStoreProvider, storeName);
        }
        final List<T> allStores = new ArrayList<>();
        for (final StreamThreadStateStoreProvider storeProvider : storeProviders) {
            allStores.addAll(storeProvider.stores(storeName, queryableStoreType, storeQueryParams));
        }
        if (allStores.isEmpty()) {
            throw new InvalidStateStoreException("The state store, " + storeName + ", may have migrated to another instance.");
        }
        return queryableStoreType.create(
            new WrappingStoreProvider(storeProviders, storeQueryParams),
            storeName
        );
    }
StreamThreadStateStoreProvider.java
package org.apache.kafka.streams.state.internals;

public class StreamThreadStateStoreProvider {

    private final StreamThread streamThread;
    private final InternalTopologyBuilder internalTopologyBuilder;

    public StreamThreadStateStoreProvider(final StreamThread streamThread, final InternalTopologyBuilder internalTopologyBuilder) {
        this.streamThread = streamThread;
        this.internalTopologyBuilder = internalTopologyBuilder;
    }

    @SuppressWarnings("unchecked")
    public <T> List<T> stores(final String storeName,
                              final QueryableStoreType<T> queryableStoreType,
                              final StoreQueryParams storeQueryParams) {

        final TaskId keyTaskId = createKeyTaskId(storeName, storeQueryParams.getPartition());
        if (streamThread.state() == StreamThread.State.DEAD) {
            return Collections.emptyList();
        }
        final StreamThread.State state = streamThread.state();
        if (storeQueryParams.includeStaleStores() ? state.isAlive() : state == StreamThread.State.RUNNING) {
            final Map<TaskId, ? extends Task> tasks = storeQueryParams.includeStaleStores() ? streamThread.allTasks() : streamThread.activeTasks();
            final List<T> stores = new ArrayList<>();
            for (final Task streamTask : tasks.values()) {
                if (keyTaskId != null && !keyTaskId.equals(streamTask.id())) {
                    continue;
                }
                final StateStore store = streamTask.getStore(storeName);
                if (store != null && queryableStoreType.accepts(store)) {
                    if (!store.isOpen()) {
                        throw new InvalidStateStoreException(
                            "Cannot get state store " + storeName + " for task " + streamTask +
                                " because the store is not open. " +
                                "The state store may have migrated to another instances.");
                    }
                    if (store instanceof TimestampedKeyValueStore && queryableStoreType instanceof QueryableStoreTypes.KeyValueStoreType) {
                        stores.add((T) new ReadOnlyKeyValueStoreFacade<>((TimestampedKeyValueStore<Object, Object>) store));
                    } else if (store instanceof TimestampedWindowStore && queryableStoreType instanceof QueryableStoreTypes.WindowStoreType) {
                        stores.add((T) new ReadOnlyWindowStoreFacade<>((TimestampedWindowStore<Object, Object>) store));
                    } else {
                        stores.add((T) store);
                    }
                }
            }
            return stores;
        } else {
            throw new InvalidStateStoreException("Cannot get state store " + storeName + " because the stream thread is " +
                                                     state + ", not RUNNING" +
                                                     (storeQueryParams.includeStaleStores() ? " or REBALANCING" : ""));
        }
    }

    private TaskId createKeyTaskId(final String storeName, final Integer partition) {
        if (partition == null) {
            return null;
        }
        final List<String> sourceTopics = internalTopologyBuilder.stateStoreNameToSourceTopics().get(storeName);
        final Set<String> sourceTopicsSet = sourceTopics.stream().collect(Collectors.toSet());
        final Map<Integer, InternalTopologyBuilder.TopicsInfo> topicGroups = internalTopologyBuilder.topicGroups();
        for (final Map.Entry<Integer, InternalTopologyBuilder.TopicsInfo> topicGroup : topicGroups.entrySet()) {
            if (topicGroup.getValue().sourceTopics.containsAll(sourceTopicsSet)) {
                return new TaskId(topicGroup.getKey(), partition.intValue());
            }
        }
        throw new InvalidStateStoreException("Cannot get state store " + storeName + " because the requested partition " + partition + "is" +
                                                "not available on this instance");
    }
}


Proposed Changes:

  • Add a new public class StoreQueryParams.java to set options for what kind of stores a user wants.
  • Create a taskId from the combination of store name and partition provided by the user as shown in StreamThreadStateStoreProvider#createKeyTaskId() above.
  • In StreamThreadStateStoreProvider.java return only the stores for the task requested by the user and also check the condition to return only running stores or standby/recovering stores as well.


Compatibility, Deprecation, and Migration Plan:

  • As a part of KIP-535  KIP the functions QueryableStoreProvider#getStore() and StreamThreadStateStoreProvider#stores() have already been changed so we will overwrite those functions once more before merging shipping both the features together in 2.5.0.


Rejected Alternatives:

  • Overload the QueryableStoreProvider#getStore() and StreamThreadStateStoreProvider#stores() with new parameters to pass a list of partitions along with the currently passed flag includeStaleStores.
  • No labels