You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

Status

Current stateUnder Discussion

Discussion thread: here 

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently, while building a new Topology through the KStreams DSL the processors are automatically named.

The genarated names are prefixed depending of the operation (i.e KSTREAM-SOURCE, KSTREAM-FILTER, KSTREAM-MAP, etc).  Moreover, they are added with a suffix like "-0000000000" to guarantee their uniqueness.

To debug/understand a topology it is possible to display the processor lineage with the method Topology#describe(). However, a complex topology with dozens of operations can be hard to understand if the processor names are not relevant.

So allowing users to be able to set more meaningful names can help to resolve complexity of some developed topologies. For example, a processor name could describe the business rule performed by a map() operation.

In addition, the generated names have a few disadvantages to guarantee topology compatibilities. In fact, adding a new operator, using a third-library doing some optimization to remove some operators or upgrading to a new KafkaStreams version with internal API changes may changed suffix indexing for a large amount of the processor names. This will in turn change the internal state store names, as well as internal topic names as well.

Public Interfaces

First, we propose to add one new interface NamedOperation that can be used to customize stateless operations as well as stateful ones. The objective to create a new class is to keep consistent with the overall API design.

package org.apache.kafka.streams.kstream;

/**
 * Default interface which can be used to personalized the named of operations, internal topics or store.
 */
public interface NamedOperation<T extends NamedOperation<T>> {

    /**
     * Return the name to be used for operation.
     *
     * @return a string name
     */
    String name();

    /**
     * Sets the name to be used for operation.
     *
     * @param name  the name to use.
     * @return an instance of {@link NamedOperation}
     */
    T withName(final String name);

}



The NamedOperation interface will be implemented/extended by following classes that already exist to configure operations : 

  • Produced 
  • Consumed 
  • Printed 
  • Joined 
  • Grouped 
  • Suppressed 

Then, we propose to overload all stateless methods that do not have one of the existing control classes listed above to accept a NamedOperation implementation.

For that we will added a new default class Named implementing NamedOperation :

public class Named implements NamedOperation<Named> {

    private static final int MAX_NAME_LENGTH = 249;

    protected String name;

    protected Named(final String name) {
        this.name = name;
        if (name != null)
            validate(name);
    }

    /**
     * Create a Named instance with provided name.
     *
     * @param name  the processor name to be used. If {@code null} a default processor name will be generated.
     * @return      A new {@link Named} instance configured with name
     */
    public static Named as(final String name) {
        Objects.requireNonNull(name, "name can't be null");
        return new Named(name);
    }

    @Override
    public String name() {
        return name;
    }

    @Override
    public Named withName(final String name) {
        Objects.requireNonNull(name, "name can't be null");
        return new Named(name);
    }
...
}



The below tables will resume all new and existing methods :

KStream (16 new methods)

methodAdded for this KIP ?Object/method used for node nameUsed for repartition topic nameUsed for state store name ?
filter(Predicate, Named)YESstatic Named#as(String)N/AN/A
filterNot(Predicate, Named)YESstatic Named#as(String)N/AN/A
selectKey(KeyValueMapper, Named)YESstatic Named#as(String)N/AN/A
map(KeyValueMapper, Named)YESstatic Named#as(String)N/AN/A
mapValues(ValueMapper, Named)YESstatic Named#as(String)N/AN/A
mapValues(ValueMapperWithKey, Named)YESstatic Named#as(String)N/AN/A
flatMap(KeyValueMapper, Named)YESstatic Named#as(String)N/AN/A
flatMapValues(ValueMapper, Named)YESstatic Named#as(String)N/AN/A
flatMapValues(ValueMapperWithKey, Named)YESstatic Named#as(String)N/AN/A
print(Printed)NOstatic Printed#as(String)N/AN/A
foreach(ForeachAction, Named)YESstatic Named#as(String)N/AN/A
peek(ForeachAction, Named)YESstatic Named#as(String)N/AN/A
branch(Named, Predicate...)YESstatic Named#as(String)N/AN/A
through(String, Produced)NOstatic Produced#as(String)N/AN/A
to(String, Produced)NOstatic Produced#as(String)N/AN/A
to(TopicNameExtractor, Produced)NOstatic Produced#as(String)N/AN/A
transform(TransformerSupplier, Named, String... )YESstatic Named#as(String)N/AN/A
transformValues(ValueTransformerSupplier, Named, String...)YESstatic Named#as(String)N/AN/A
transformValues( ValueTransformerWithKeySupplier, Named, String...)YESstatic Named#as(String)N/AN/A
process(ProcessorSupplier, Named, String...)YESstatic Named#as(String)N/AN/A
join( KStream, ValueJoiner, JoinWindows windows, Joined)NOstatic Joined#named(final String name)static Joined#named(final String name)static Joined#named(final String name)
leftJoin(KStream, ValueJoiner, JoinWindows, Joined)NOstatic Joined#named(final String name)static Joined#named(final String name)static Joined#named(final String name)
outerJoin(KStream, ValueJoiner, JoinWindows, Joined)NOstatic Joined#named(final String name)static Joined#named(final String name)static Joined#named(final String name)
join(KTable, ValueJoiner, Joined)NOstatic Joined#named(final String name)static Joined#named(final String name)N/A
leftJoin(KTable, ValueJoiner, Joined)NOstatic Joined#named(final String name)static Joined#named(final String name)N/A
join(GlobalKTable, KeyValueMapper, ValueJoiner)NO?????????
leftJoin(GlobalKTable, KeyValueMapper, ValueJoiner)NO?????????


KTable (16 new methods)


methodAdded for this KIP ?Object/method used for node nameUsed for repartition topic nameUsed for state store name ?
filter(Predicate, Named)YESstatic Named#as(String)N/Astatic Named#as(String)
filter(Predicate, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
filterNot(Predicate, Named)YESstatic Named#as(String)N/Astatic Named#as(String)
filterNot(Predicate, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
mapValues(ValueMapper, Named)YESstatic Named#as(String)N/Astatic Named#as(String)
mapValues(ValueMapperWithKey, Named)YESstatic Named#as(String)N/Astatic Named#as(String)
mapValues(ValueMapper, Named, Materialized)YESstatic Named#as(String)N/Astatic Materialized#as(String)
mapValues(ValueMapperWithKey, Named, Materialized);YESstatic Named#as(String)N/Astatic Materialized#as(String)
suppress(Suppressed)NOSuppressed#withName(String)N/AN/A
transformValues(ValueTransformerWithKeySupplier, Named, String...)YESstatic Named#as(String)N/Astatic Named#as(String)
transformValues(ValueTransformerWithKeySupplier, Materialized, Named, String...)YESstatic Named#as(String)N/Astatic Materialized#as(String)
groupBy(KeyValueMapper, KeyValue, Grouped)NOstatic Grouped#as(String)static Grouped#as(String)N/A
join(KTable, ValueJoiner, Named)YESstatic Named#as(String)N/Astatic Named#as(String)
join(KTable, ValueJoiner, Named, Materialized)YESstatic Named#as(String)N/Astatic Materialized#as(String)
leftJoin(KTable, ValueJoiner, Named);YESstatic Named#as(String)N/Astatic Named#as(String)
leftJoin(KTable, ValueJoiner, Named, Materialized)YESstatic Named#as(String)N/Astatic Materialized#as(String)
outerJoin(KTable, ValueJoiner, Named);YESstatic Named#as(String)N/Astatic Named#as(String)
outerJoin(KTable, ValueJoiner, Named, Materialized)YESstatic Named#as(String)N/Astatic Materialized#as(String)


KGroupedStream (6 new methods)

methodAdded for this KIP ?Object/method used for node nameUsed for repartition topic nameUsed for state store name ?
count(Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
count(Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
reduce(Reducer, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
reduce(Reducer, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
aggregate(Initializer, Aggregator, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
aggregate(Initializer, Aggregator, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)

KGroupedTable (6 new methods)


methodAdded for this KIP ?Object/method used for node nameUsed for repartition topic nameUsed for state store name ?
count(Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
count(Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
reduce(Reducer, Reducer, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
reduce(Reducer, Reducer, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
aggregate(Initializer, Aggregator, Aggregator, Named)YESstatic Named#as(String)N/A(PREFIX + COUNT)
aggregate(Initializer, Aggregator, Aggregator, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)


TimeWindowedKStream (6 new methods)

methodAdded for this KIP ?Object/method used for node nameUsed for repartition topic nameUsed for state store name ?
count(Named)YESstatic Named#as(String)N/Astatic Named#as(String)
count(Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
aggregate(Initializer, Aggregator, Named)YESstatic Named#as(String)N/Astatic Named#as(String)
aggregate(Initializer, Aggregator, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
reduce(Reducer, Named)YESstatic Named#as(String)N/Astatic Named#as(String)
reduce(Reducer, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)


SessionWindowedKStream (6 new methods)

methodAdded for this KIP ?Object/method used for node nameUsed for repartition topic nameUsed for state store name ?
count(Named)YESstatic Named#as(String)N/Astatic Named#as(String)
count(Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
aggregate(Initializer, Aggregator, Merger, Named)YESstatic Named#as(String)N/Astatic Named#as(String)
aggregate(Initializer, Aggregator, Merger, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)
reduce(Reducer, Named)YESstatic Named#as(String)N/Astatic Named#as(String)
reduce(Reducer, Named, Materialized)YESstatic Named#as(String)N/AMaterialized#as(String)


Note that for all methods accepting a Materialized argument, if no state store named is provided then the node named will be used to generate a one. The state store name will be the node name suffixed with "-table".

Proposed Changes

  • Implement the new interface NamedOperation and default class Named 
  • Update all parameter class to implement NamedOperation : Produced Consumed Printed Joined Grouped Suppressed
  • Overload methods stateless for classes KStreams, KTables, KGroupedStream, KGroupedTable, TimeWindowedKStream, TimeWindowedKTable
  • The processor names specified by developer will be used in place of the static processor prefix. Statics prefixes will still be used if no custom processor name are specified.
  • Processor names should follow the same restrictions as the topic names. So legal characters are [a-zA-Z0-9._-] and the maximum length us 249.


Below is an application example : 

final StreamsBuilder builder = new StreamsBuilder();


builder.stream("topic-input", Consumed.as("STREAM-FROM-TOPIC-INPUT")
        .filter((k, v) -> true ), Named.as("FILTER-NULL-VALUE")
        .map((k, v) -> KeyValue.pair(k, v.toUpperCase()), Named.as("MAP-TO-UPPERCASE")
        .to("topic-output", Produced.as("TO-OUTPUT-TOPIC"));

System.out.println(builder.build().describe());
---- (output)----
Topologies:
   Sub-topology: 0
    Source: STREAM-FROM-TOPIC-INPUT (topics: [topic-input])
      --> FILTER-NULL-VALUE
    Processor: FILTER-NULL-VALUE (stores: [])
      --> MAP-TO-UPPERCASE
      <-- STREAM-FROM-TOPIC-INPUT
    Processor: MAP-TO-UPPERCASE (stores: [])
      --> TO-OUTPUT-TOPIC
      <-- FILTER-NULL-VALUE
    Sink: TO-OUTPUT-TOPIC (topic: topic-output)
      <-- MAP-TO-UPPERCASE

Compatibility, Deprecation, and Migration Plan

No compatibility issues foreseen.

Rejected Alternatives

  1. The first proposition was to overload all stateless methods to accept an instance of Described class. However this solution was resulting in modiying a large percentage of the existing KStream and KTable methods.
  2. The second proposition was to add new methods KStreams#as(Described) and KTable#as(Described) while Described class would be used to customized the named of operation defined previously in the stream. However not only this new method was not conservative with the existing APIs but it also introduce some complexities for methods returning Void.
  • No labels