Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current stateUnder DiscussionAccepted

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread] 

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

So allowing users to be able to set more meaningful names can help to resolve complexity of some developed topologies. For example, a processor name could describe the business rule performed by a map() operation.

Public Interfaces

In addition, the generated names have a few disadvantages to guarantee topology compatibilities. In fact, adding a new operator, using a third-library doing some optimization to remove some operators or upgrading to a new KafkaStreams version with internal API changes may changed suffix indexing for a large amount of the processor names. This will in turn change the internal state store names, as well as internal topic names as well.

Public Interfaces

First, we propose to add one new class Described that can be used to provide the optional parameters when processing each record (i.e to set the processor name). The objective to create a new class is to keep consistent with the overall API design.

First, we propose to add one new interface NamedOperation that can be used to customize stateless operations as well as stateful ones. The objective to create a new class is to keep consistent with the overall API design.

Code Block
languagejava
package org.apache.kafka.streams.kstream;

/**
 * Default interface which can be used to personalized the named of operations, internal topics or store.
 */
public interface NamedOperation<T extends NamedOperation<T>> {

    /**
     * Sets the name to be used for operation.
     *
     * @param name  the name to use.
     * @return an instance of {@link NamedOperation}
     */
    T withName(final String name);

}



The NamedOperation interface will be implemented/extended by following classes that already exist to configure operations : 

  • Produced 
  • Consumed 
  • Printed 
  • Joined 
  • Grouped 
  • Suppressed 

In addition, we propose to add a new static method with the following signature to each of those class as(final String processorName).

Deprecration

In order to fix some inconsistencies in API, we also propose to deprecate the method named(String) from the class Joined in favor of new method as().

In addition, as no configuration classes expose the processor name, we will also deprecate the method name() from the class Joined . This method should be removed in a future release.

Overloaded methods


Then, we propose to overload all stateless methods that do not have one of the existing control classes listed above to accept a NamedOperation implementation.

For that we will added a new default class Named implementing NamedOperation :


Code Block
languagejava

public class Named implements NamedOperation<Named> {

    private static final int MAX_NAME_LENGTH = 249;

    protected String name;

    protected Named(final String name
Code Block
languagejava
package org.apache.kafka.streams.kstream;

import java.util.Objects;

/**
 * This class is used to provide the optional parameters when processing each record.
 */
public class Described {

    protected final String processorName;

    protected Described(final String processorName) {
        this.processorName = processorName;
    }

    protected Described(final Described described) {
        this(described.getProcessorName());
    }

    /**
     * Create a Described instance with provided processor name.
     * @param processorName      name to use to describe the {@link org.apache.kafka.streams.processor.Processor}.
     * @return  A new {@link Described} instance configured withName processorName
     */
    public static Described withName(final String processorName) {
        returnthis.name new Described(processorName)= name;
    }

    @Override
if (name != null)
 public boolean equals(Object o) {
        if (this == o) return truevalidate(name);
    }

    if (o == null || getClass() != o.getClass()) return false;
   /**
     * Create a Named instance with provided name.
     Described*
 described = (Described) o;
 * @param name  the processor name return Objects.equals(processorName, described.processorName);
    }

    @Override
    public int hashCode() {
  to be used. If {@code null} a default processor name will be generated.
     * @return      return Objects.hash(processorName);
    }
A new {@link Named} instance configured with name
     */
    public static Named as(final String getProcessorName(name) {
        return processorName;
Objects.requireNonNull(name, "name can't be null");
         }
}

This class will be then used by a new method as(Processed)  added to KStream and KTable classes : 

Code Block
languagejava
/**
 * Describe the record-by-record operation previously made to obtain this stream.
 * This is not an operation. This method can be used to customized a processor name return
 *
 * <p>
 * For example you can used this method to customize the name of the last generated processors applied to this stream.
 * The given processor name will display in the description of the {@code Topology} built.
 * <pre>{@code
 * KStream<String, String> inputStream = builder.stream("topic");
 * KStream<String, String> outputStream = inputStream.map(new KeyValueMapper<String, String, KeyValue<String, String>> {
 *     KeyValue<String, String> apply(String key, String value) {
 *         return new KeyValue<>(key, value.toUpperCase());
 *     }
 * });
 * outputStream.as(Described.withName("MAP-VALUE-TO-UPPERCASE"))
 * }</pre>
 * </p>
 * @param described the options to describe the processor operation.
 * @return a {@code KStream}
 */
KStream<K, V> as(Described described);

Stateless operations

Stateless operations like stream(), map(), filter(), are translated into a single processor. The method depicted above will directly change the processor name.

Statefull operations

Stateful operations like table(), join() are translated to multiple processors. The given name should be used to describe the first processor and as a prefix for all subsequent processors.

The method branch() results in multiple streams that can be described with the used of the new added method.

However to be able to custome the name of the first translated processor we propose to overload this method with a one accepting a Described instance in argument.

Sink operations

The Sink operations returning Void can't use the method depicted above.

Currently, KStream<K, V> interface defines the following methods : 

  • void foreach(final ForeachAction<? super K, ? super V> action); 
  • void process(final ProcessorSupplier<? super K, ? super V> processorSupplier, final String... stateStoreNames);

We propose to overload each of this method to accept a Described argument.

Then, for those  three methods we propose to add a new method withProcessorName() to classes Produced and Printed in order to minimize the number of overloaded methods.

  • void print(final Printed<K, V> printed)
  • void to(final String topic, final Produced<K, V> produced)
  • void to(final TopicNameExtractor<K, V> topicExtractor, final Produced<K, V> produced)

Example for the Produced class : 

Code Block
languagejava
/**
 * Create an instance of {@link Produced} with provided processor name.
 *
 * @param processorName the processor name to be used. If {@code null} a default processor name will be generated
 * @param <K>         key type
 * @param <V>         value type
 * @return a new instance of {@link Produced}
 */
public static <K, V> Produced<K, V> with(final String processorName) {
    return new Produced<>(null, null, null, null, processorName);
}

/**
 * Configure the instance of {@link Produced} with a key {@link Serde}.
 *
 * @param processorName the processor name to be used. If {@code null} a default processor name will be generated
 * @return this
 */
public Produced<K, V> withProcessorName(final String processorName) {
    this.processorName = processorName;
    return this;
}

Then we propose to add new methods to the existing classes : Consumed, Produced, Joined and Printed in order to be able to define custom processor names.

Proposed Changes

return new Named(name);
    }

    @Override
    public Named withName(final String name) {
        Objects.requireNonNull(name, "name can't be null");
        return new Named(name);
    }
...
}


The below tables will resume all new and existing methods :

KStream (16 new methods)

methodAdded for this KIP ?Object/method used for node nameUsed for repartition topic nameUsed for state store name ?
filter(Predicate, Named)YESstatic Named#as(String)N/AN/A
filterNot(Predicate, Named)YESstatic Named#as(String)N/AN/A
selectKey(KeyValueMapper, Named)YESstatic Named#as(String)N/AN/A
map(KeyValueMapper, Named)YESstatic Named#as(String)N/AN/A
mapValues(ValueMapper, Named)YESstatic Named#as(String)N/AN/A
mapValues(ValueMapperWithKey, Named)YESstatic Named#as(String)N/AN/A
flatMap(KeyValueMapper, Named)YESstatic Named#as(String)N/AN/A
flatMapValues(ValueMapper, Named)YESstatic Named#as(String)N/AN/A
flatMapValues(ValueMapperWithKey, Named)YESstatic Named#as(String)N/AN/A
print(Printed)NOstatic Printed#as(String)N/AN/A
foreach(ForeachAction, Named)YESstatic Named#as(String)N/AN/A
peek(ForeachAction, Named)YESstatic Named#as(String)N/AN/A
branch(Named, Predicate...)YESstatic Named#as(String)N/AN/A
through(String, Produced)NOstatic Produced#as(String)N/AN/A
to(String, Produced)NOstatic Produced#as(String)N/AN/A
to(TopicNameExtractor, Produced)NOstatic Produced#as(String)N/AN/A
transform(TransformerSupplier, Named, String... )YESstatic Named#as(String)N/AN/A
transformValues(ValueTransformerSupplier, Named, String...)YESstatic Named#as(String)N/AN/A
transformValues( ValueTransformerWithKeySupplier, Named, String...)YESstatic Named#as(String)N/AN/A
process(ProcessorSupplier, Named, String...)YESstatic Named#as(String)N/AN/A
join( KStream, ValueJoiner, JoinWindows windows, Joined)NOstatic Joined#as(final String name)static Joined#as(final String name)static Joined#as(final String name)
leftJoin(KStream, ValueJoiner, JoinWindows, Joined)NOstatic Joined#as(final String name)static Joined#as(final String name)static Joined#as(final String name)
outerJoin(KStream, ValueJoiner, JoinWindows, Joined)NOstatic Joined#as(final String name)static Joined#as(final String name)static Joined#as(final String name)
join(KTable, ValueJoiner, Joined)NOstatic Joined#as(final String name)static Joined#as(final String name)N/A
leftJoin(KTable, ValueJoiner, Joined)NOstatic Joined#as(final String name)static Joined#as(final String name)N/A
join(GlobalKTbale, KeyValueMapper, ValueJoiner, Named)YESstatic Named#as(String)
N/AN/A
leftJoin(GlobalKTable, KeyValueMapper, ValueJoiner, Named)YESstatic Named#as(String)
N/AN/A
flatTransform(TransformerSupplier, Named named, String... stateStoreNames)YESstatic Named#as(String)N/AN/A
flatTransformValues(ValueTransformerWithKeySupplier, Named,  String... )YESstatic Named#as(String)N/AN/A
flatTransformValues(ValueTransformerSupplier, Named, String...)YESstatic Named#as(String)N/AN/A


KTable (16 new methods)


methodAdded for this KIP ?Object/method used for node nameUsed for repartition topic nameUsed for state store name ?
filter(Predicate, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
filter(Predicate, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
filterNot(Predicate, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
filterNot(Predicate, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
mapValues(ValueMapper, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
mapValues(ValueMapperWithKey, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
mapValues(ValueMapper, Named, Materialized)YESstatic Named#as(String)N/Astatic Materialized#as(String)
mapValues(ValueMapperWithKey, Named, Materialized);YESstatic Named#as(String)N/Astatic Materialized#as(String)
suppress(Suppressed)NOSuppressed#withName(String)N/AN/A
transformValues(ValueTransformerWithKeySupplier, Named, String...)YESstatic Named#as(String)N/A(PREFIX + COUNT)
transformValues(ValueTransformerWithKeySupplier, Materialized, Named, String...)YESstatic Named#as(String)N/Astatic Materialized#as(String)
groupBy(KeyValueMapper, KeyValue, Grouped)NOstatic Grouped#as(String)static Grouped#as(String)N/A
join(KTable, ValueJoiner, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
join(KTable, ValueJoiner, Named, Materialized)YESstatic Named#as(String)N/Astatic Materialized#as(String)
leftJoin(KTable, ValueJoiner, Named);YESstatic Named#as(String)N/A(PREFIX + COUNT)
leftJoin(KTable, ValueJoiner, Named, Materialized)YESstatic Named#as(String)N/Astatic Materialized#as(String)
outerJoin(KTable, ValueJoiner, Named);YESstatic Named#as(String)N/A(PREFIX + COUNT)
outerJoin(KTable, ValueJoiner, Named, Materialized)YESstatic Named#as(String)N/Astatic Materialized#as(String)
toStream(Named)YESstatic Named#as(String)N/AN/A
toStream(KeyValueMapper, Named)YESstatic Named#as(String)N/AN/A


KGroupedStream (6 new methods)


methodAdded for this KIP ?Object/method used for node nameUsed for repartition topic nameUsed for state store name ?
count(Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
count(Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
reduce(Reducer, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
reduce(Reducer, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
aggregate(Initializer, Aggregator, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
aggregate(Initializer, Aggregator, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)


KGroupedTable (6 new methods)


methodAdded for this KIP ?Object/method used for node nameUsed for repartition topic nameUsed for state store name ?
count(Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
count(Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
reduce(Reducer, Reducer, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
reduce(Reducer, Reducer, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
aggregate(Initializer, Aggregator, Aggregator, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
aggregate(Initializer, Aggregator, Aggregator, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)


TimeWindowedKStream (6 new methods)

methodAdded for this KIP ?Object/method used for node nameUsed for repartition topic nameUsed for state store name ?
count(Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
count(Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
aggregate(Initializer, Aggregator, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
aggregate(Initializer, Aggregator, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
reduce(Reducer, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
reduce(Reducer, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)


SessionWindowedKStream (6 new methods)

methodAdded for this KIP ?Object/method used for node nameUsed for repartition topic nameUsed for state store name ?
count(Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
count(Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
aggregate(Initializer, Aggregator, Merger, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
aggregate(Initializer, Aggregator, Merger, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
reduce(Reducer, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
reduce(Reducer, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)


At the end, we can summarize the scope of each configuration class as follow : 



GeneratedNamedJoined / Grouped / Produced / ConsumedMaterialized
Node NameXXX
Repartition TopicX
X
Queryable Store


X
State storeX
XX
Changelog TopicX
XX


Materialized


The main reason why we propose to overload each method accepting a Materialized argument is to not introduce ambitguity by conflating config objects that configure an operation (like Grouped, Joined) with config objects that configure an aspect of the operation (like Materialized).

Name Validation

User provided node name should follow the same restrictions that ones currently apply to state stores during the create of Materialized instance.

Currently, the Materialized class relies on the static method Topic#validate. This method ensure that a provided name only contains legal characters [a-zA-Z0-9._-] and have a maximum length of 249.


We propose to copy methods from Topic#validate into Named. This new method will be used validate both store names and node names. The benefit is to remove a dependency with the core module.

In addition, the Materialized class will throw a TopologyException while building the topology in case of a unvalid name instead of InvalidTopicException .


Proposed Changes

  • Implement the new interface NamedOperation and default class Named 
  • Update all parameter class to implement NamedOperation : Produced Consumed Printed Joined Grouped Suppressed
  • Overload methods stateless for classes KStreams, KTables, KGroupedStream, KGroupedTable, TimeWindowedKStream, TimeWindowedKTable
  • Implement the new class Described and update the KStream/KStreamImpl, KTable/KTableImpl methods that add stateless processor (i.e map(), mapValue(), filter()...).
  • Update the Consumed, Produced and Joined classes to be able to set a processor name.
  • The processor names specified by developer will be used in place of the static processor prefix. Statics prefixes will still be used if no custom processor name are specifiedno custom processor name are specified.
  • Processor names should follow the same restrictions as the topic names. So legal characters are [a-zA-Z0-9._-] and the maximum length of 249.


Below is an application example : 

Code Block
languagejava
final StreamsBuilder builder = new StreamsBuilder();


KStream<String, String> stream = builder.stream("topic-input");
        stream.as(Described.withName, Consumed.as("STREAM-FROM-TOPIC-INPUT"))
        .filter( (k, v) -> true ), Named.as(Described.withName("FILTER-NULL-VALUE"))
        .map( (k, v) -> KeyValue.pair(k, v.toUpperCase())).as(Described.withName, Named.as("MAP-TO-UPPERCASE"))
        .to("topic-output", Produced.as("TO-OUTPUT-TOPIC"));

System.out.println(builder.build().describe());
---- (output)----
Topologies:
   Sub-topology: 0
    Source: STREAM-FROM-TOPIC-INPUT (topics: [topic-input])
      --> FILTER-NULL-VALUE
    Processor: FILTER-NULL-VALUE (stores: [])
      --> MAP-TO-UPPERCASE
      <-- STREAM-FROM-TOPIC-INPUT
    Processor: MAP-TO-UPPERCASE (stores: [])
      --> KSTREAM-SINK-0000000003 TO-OUTPUT-TOPIC
      <-- FILTER-NULL-VALUE
    Sink: TO-OUTPUT-TOPIC (topic: topic-output)
      <-- FILTERMAP-NULL-VALUE
    Sink: KSTREAM-SINK-0000000003 (topic: topic-output)
      <-- MAP-TO-UPPERCASE

Compatibility, Deprecation, and Migration Plan

No compatibility issues foreseen.

Rejected Alternatives

TO-UPPERCASE


A straightforward first pass is GitHub PR 6958

Compatibility, Deprecation, and Migration Plan

No compatibility issues foreseen.

Rejected Alternatives

  1. The first proposition was to add new methods KStreams#as(Described) and KTable#as(Described) while Described class would be used to customized the named of operation defined previously in the stream. However not only this new method was not conservative with the existing APIs but it also introduce some complexities for methods returning Void.
  2. The second proposition was to enrich all actions classes (Reducer, Predicate, etc) with a new default method "as(String)" in order to name the operation. But this leads to mix different classes with different semantics (Predicate vs Consumed/Produced) creating a couple of unfortunate side-effects:The first proposition was to overload all stateless methods to accept an instance of Described class. However this solution was resulting in modiying a large percentage of the existing KStream and KTable methods.