IDIEP-[NUMBER]
AuthorAlexander
Sponsor
Created

 

StatusDRAFT


Motivation

Tracing provides information useful for debugging that both helps with regular, daily-basic system monitoring and with incidents analysis. Within the scope of Apache Ignite almost every process or sub-system could be traced including:

  • Communication;
  • Discovery;
  • Exchange;
  • Transactions;
  • and many-many others

Each sub-system has specific motivation for tracing. For example, transaction tracing can highlight contention on some resource while taking a lock and thereby explain slow transaction.

Description

Seems that it makes sense to support an approach with extensibility and configurability of the trace/span handlers through common Ignite Service Provider Interfaces.
And cause Open Census is de-facto a standard (or at least one of the most popular solutions right now) it has sense to use it as build in tracing-handler implementation.

Key Interfaces

As an initial step following interfaces are proposed:

SpanManager

Manager for Span instances.

SpanManger
/**
 * Manager for {@link Span} instances.
 */
public interface SpanManager {
    /**
     * Creates Span with given name.
     *
     * @param spanType Type of span to create.
     */
    default Span create(@NotNull SpanType spanType) {
        return create(spanType, (Span)null);
    }

    /**
     * Creates Span given name and explicit parent.
     *
     * @param spanType Type of span to create.
     * @param parentSpan Parent span.
     * @return Created span.
     */
    Span create(@NotNull SpanType spanType, @Nullable Span parentSpan);

    /**
     * Creates Span given name and explicit parent.
     *
     * @param spanType Type of span to create.
     * @param serializedParentSpan Parent span as serialized bytes.
     * @return Created span.
     */
    Span create(@NotNull SpanType spanType, @Nullable byte[] serializedParentSpan);

    /**
     * Creates Span given name and explicit parent.
     *
     * @param spanType Type of span to create.
     * @param parentSpan Parent span.
     * @param lb Label.
     * @return Created span.
     */
    @NotNull Span create (
        @NotNull SpanType spanType,
        @Nullable Span parentSpan,
        @Nullable String lb);

    /**
     * Serializes span to byte array to send context over network.
     *
     * @param span Span.
     */
    byte[] serialize(@NotNull Span span);
}

Span

Logical piece of a trace that represents a single operation. Each unit work is called a Span in a trace. Spans include metadata about the work, including the time spent in the step (latency), status, time events, attributes, links.

Span
/**
 * Logical piece of a trace that represents a single operation.
 * Each unit work is called a Span in a trace.
 * Spans include metadata about the work, including the time spent in the step (latency),
 * status, time events, attributes, links.
 * You can use tracing to debug errors and latency issues in your applications.
 */
public interface Span {

    /**
     * Adds tag to span with {@code String} value.
     *
     * @param tagName Tag name.
     * @param tagVal Tag value.
     */
    Span addTag(String tagName, String tagVal);

    /**
     * Adds tag to span with {@code long} value.
     *
     * @param tagName Tag name.
     * @param tagVal Tag value.
     */
    Span addTag(String tagName, long tagVal);

    /**
     * Logs work to span.
     *
     * @param logDesc Log description.
     */
    Span addLog(String logDesc);

    /**
     * Adds log to span with additional attributes.
     *
     * @param logDesc Log description.
     * @param attrs Attributes.
     */
    Span addLog(String logDesc, Map<String, String> attrs);

    /**
     * Explicitly set status for span.
     *
     * @param spanStatus Status.
     */
    Span setStatus(SpanStatus spanStatus);

    /**
     * Ends span. This action sets default status if not set and mark the span as ready to be exported.
     */
    Span end();

    /**
     * @return {@code true} if span has already ended.
     */
    boolean isEnded();

    /**
     * @return Type of given span.

TracingConfigurationManager

Allows to configure tracing, read the configuration and restore it to the defaults.

TracingConfigurationManager
/**
 * Allows to configure tracing, read the configuration and restore it to the defaults.
 */
public interface TracingConfigurationManager {
    /**
     * Set new tracing configuration for the specific tracing coordinates (scope, label, etc.).
     * If tracing configuration with specified coordinates already exists it'll be overrided,
     * otherwise new one will be created.
     *
     * @param coordinates {@link TracingConfigurationCoordinates} Specific set of locators like {@link Scope} and label,
     *  that defines subset of traces and/or spans that'll use given configuration.
     * @param parameters {@link TracingConfigurationParameters} e.g. sampling rate, set of included scopes etc.
     * @throws IgniteException If failed to set tracing configuration.
     */
    void set(@NotNull TracingConfigurationCoordinates coordinates,
        @NotNull TracingConfigurationParameters parameters) throws IgniteException;

    /**
     * Get the most specific tracing parameters for the specified tracing coordinates (scope, label, etc.).
     * The most specific means:
     * <ul>
     *     <li>
     *         If there's tracing configuration that matches all tracing configuration attributes (scope and label) —
     *         it'll be returned.
     *     </li>
     *     <li>
     *         If there's no tracing configuration with specified label, or label wasn't specified —
     *         scope specific tracing configuration will be returned.
     *     </li>
     *     <li>
     *         If there's no tracing configuration with specified scope —
     *         default scope specific configuration will be returned.
     *     </li>
     * </ul>
     *
     * @param coordinates {@link TracingConfigurationCoordinates} Specific set of locators like {@link Scope} and label
     *  that defines a subset of traces and/or spans that'll use given configuration.
     * @return {@link TracingConfigurationParameters} instance.
     * @throws IgniteException If failed to get tracing configuration.
     */
    default @NotNull TracingConfigurationParameters get(
        @NotNull TracingConfigurationCoordinates coordinates) throws IgniteException
    {
        switch (coordinates.scope()) {
            case TX: {
                return DEFAULT_TX_CONFIGURATION;
            }

            case EXCHANGE: {
                return DEFAULT_EXCHANGE_CONFIGURATION;
            }


            default: {
                return NOOP_CONFIGURATION;
            }
        }
    }

    /**
     * List all pairs of tracing configuration coordinates and tracing configuration parameters
     * or list all pairs of tracing configuration and parameters for the specific scope.
     *
     * @param scope Nullable scope of tracing configuration to be retrieved.
     *  If null - all configuration will be returned.
     * @return The whole set of tracing configuration.
     * @throws IgniteException If failed to get tracing configuration.
     */
    @NotNull Map<TracingConfigurationCoordinates, TracingConfigurationParameters> getAll(
        @Nullable Scope scope) throws IgniteException;

    /**
     * Reset tracing configuration for the specific tracing coordinates (scope, label, etc.) to default values.
     * Please pay attention, that there's no default values for label specific coordinates,
     * so such kinds of configurations will be removed.
     *
     * @param coordinates {@link TracingConfigurationCoordinates} specific set of locators like {@link Scope} and label
     *  that defines a subset of traces and/or spans that will be reset.
     *  @throws IgniteException If failed to reset tracing configuration.
     */
    void reset(@NotNull TracingConfigurationCoordinates coordinates) throws IgniteException;

    /**
     * Reset tracing configuration for the specific scope, or all tracing configurations if scope not specified.
     *
     * @param scope {@link Scope} that defines a set of applicable tracing configurations.
     * @throws IgniteException If failed to reset tracing configuration.
     */
    void resetAll(@Nullable Scope scope) throws IgniteException;
}

Tracing

Tracing sub-system interface.

Tracing
/**
 * 

 */
public interface Tracing extends SpanManager {
    /**
     * @return Helper to handle traceable messages.
     */
    public TraceableMessagesHandler messages();

    /**
     * Returns the {@link TracingConfigurationManager} instance that allows to
     * <ul>
     *     <li>Configure tracing parameters such as sampling rate for the specific tracing coordinates
     *          such as scope and label.</li>
     *     <li>Retrieve the most specific tracing parameters for the specified tracing coordinates (scope and label)</li>
     *     <li>Restore the tracing parameters for the specified tracing coordinates to the default.</li>
     *     <li>List all pairs of tracing configuration coordinates and tracing configuration parameters.</li>
     * </ul>
     * @return {@link TracingConfigurationManager} instance.
     */
    public @NotNull TracingConfigurationManager configuration();
}

MTC

Despite the fact that it's not an interface it's also extremely important class that encapsulates logic of a thread-local span's storage.

MappedTracingContext
public class MTC {
    /**
     * @return Span which corresponded to current thread or null if it doesn't not set.
     */
    @NotNull public static Span span() {
        return span.get();
    }

    /**
     * Attach given span to current thread if it isn't null or attach empty span if it is null. Detach given span, close
     * it and return previous span when {@link TraceSurroundings#close()} would be called.
     *
     * @param startSpan Span which should be added to current thread.
     * @return {@link TraceSurroundings} for manage span life cycle.
     */
    public static TraceSurroundings support(Span startSpan) {
		...
    }

    /**
     * Support initial span.
     *
     * @param startSpan Span which should be added to current thread.
     */
    public static void supportInitial(Span startSpan) {
        ...
    }

    /**
     * Attach given span to current thread if it isn't null or attach empty span if it is null.
     *
     * @param startSpan Span which should be added to current thread.
     * @return {@link TraceSurroundings} for manage span life cycle.
     */
    public static TraceSurroundings supportContinual(Span startSpan) {
		...
    }
}

Examples

Below there are few examples of tracing configuration and span creation.

Tracing configuration example

Tracing Configuration Example
ignite.tracingConfiguration().set(
	new TracingConfigurationCoordinates.Builder(Scope.DISCOVERY).build(),
    new TracingConfigurationParameters.Builder().withSamplingRate(SAMPLING_RATE_ALWAYS).build());

ignite.tracingConfiguration().set(
	new TracingConfigurationCoordinates.Builder(Scope.TX).withLabel("Some specific label").build(),
    new TracingConfigurationParameters.Builder().withSamplingRate(0.1).with
(Collections.singleton(Scope.COMMUNICATION)).build());

where

  • TracingConfigurationCoordinates - specifies to which traces, specific configuration will be applied. In other words it's a sort of tracing configuration locator.
  • TracingConfigurationParameters - set of tracing configuration parameters like sampling rate or included scopes.
  • Scope - tracing span scope, e.g.
    • DISCOVERY
    • EXCHANGE
    • COMMUNICATION
    • TX
  • Label - optional label of a traced operation. At the moment of writing only transactions have labels that might be set with client.transactions().withLabel("label1")
  • SamplingRate - number between 0 and 1 that more or less reflects the probability of sampling specific trace. 0 and 1 have special meaning here, 0 means never 1 means always. Default value is 0 (never).
  • IncludedScopes - Set of Scopes that defines which sub-traces will be included in given trace. In other words, if child's span scope is equals to parent's scope or it belongs to the parent's span included scopes, then given child span will be attached to the current trace, otherwise it'll be skipped. 

Span creation example

Span creation example
Span txRootSpan = cctx.kernalContext().tracing().create(/*Span type */ TX, /*parent span */, null, lb));

Span rootSpan = tracing.create(TraceableMessagesTable.traceName(msg.getClass()))
	.addTag(SpanTags.tag(SpanTags.EVENT_NODE, SpanTags.ID), getLocalNodeId().toString())
	.addTag(SpanTags.tag(SpanTags.EVENT_NODE, SpanTags.CONSISTENT_ID), locNode.consistentId().toString())
	.addTag(SpanTags.MESSAGE_CLASS, ((CustomMessageWrapper)evt).delegate().getClass().getSimpleName())
	.addLog("Created");

where

  • SpanType is one of predefined spans
  • SpanTag, SpanLog - additional span metadata.


Span storage example

It's possible to store span with Mapped Tracing Context or as a common field. First approach is preferred one cause it produces lots of span maintenance stuff out of the box. Here's an example:

private void proceedMapping() throws IgniteCheckedException {
	try (TraceSurroundings ignored =
    	MTC.support(cctx.kernalContext().tracing().create(SpanType.TX_MAP_PROCEED, MTC.span()))) {
    	...
    }
}

Tracing Condifuratoin via control.sh

It seems useful to have an ability to configure tracing configuration via control sh. Following interface is proposed:

  Print tracing configuration:
    control.(sh|bat) --tracing-configuration

  Print tracing configuration:
    control.(sh|bat) --tracing-configuration get_all [--scope DISCOVERY|EXCHANGE|COMMUNICATION|TX]

  Print specific tracing configuration based on specified --scope and --label:
    control.(sh|bat) --tracing-configuration get (--scope DISCOVERY|EXCHANGE|COMMUNICATION|TX) [--label]

  Reset all specific tracing configuration the to default. If --scope is specified, then remove all label specific configuration for the given scope and reset given scope specific configuration to the default, if --scope is skipped then reset all tracing configurations to the default. Print tracing configuration.
    control.(sh|bat) --tracing-configuration reset_all [--scope DISCOVERY|EXCHANGE|COMMUNICATION|TX]

  Reset specific tracing configuration to the default. If both --scope and --label are specified then remove given configuration, if only --scope is specified then reset given configuration to the default. Print reseted configuration.
    control.(sh|bat) --tracing-configuration reset (--scope DISCOVERY|EXCHANGE|COMMUNICATION|TX) [--label]

  Set new tracing configuration. If both --scope and --label are specified then add or override label specific configuration, if only --scope is specified, then override scope specific configuration. Print applied configuration.
    control.(sh|bat) --tracing-configuration set (--scope DISCOVERY|EXCHANGE|COMMUNICATION|TX [--label] [--sampling-rate Decimal value between 0 and 1, where 0 means never and 1 means always. More or less reflects the probability of sampling specific trace.] [--supported-scopes Set of scopes with comma as separator  DISCOVERY|EXCHANGE|COMMUNICATION|TX])

Span serialization protocol

  • 1 byte: special flags;

    • First byte of the serializedSpan is reserved for special flags
  • 1 bytes: spi type;

    • In order to match whether span was serialized with the spi that is used on the node that is going to deserialize it.
  • 2 bytes: major protocol version;

    • Spans are considered as incompatible in case of different major protocol versions;
  • 2 bytes: minor protocol version;

    • Should be incremented in case of adding new fields, etc. Within the scope of the same major protocol version and different minor protocol versions spans are considered to be compatible.
  • 4 bytes: spi specific serialized span length;

  • n bytes: spi specific serialized span body;

  • 4 bytes: span type

  • 4 bytes included scopes size;

  • 2 * included scopes size: included scopes items one by one;

Trace Tree Example

Bellow there's an example of trace generated by pessimistic serializable transaction.

Risks and Assumptions

  • Performance drop. So careful benchmarking is requires and probably lots of optimizations.
  • Spans inconsistencies during rolling upgrade if nodes have different versions of TracingManagers or Tracing Service Provider Interfaces. Span Serialization protocol must take this into account.

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

// Links to various reference documents, if applicable.

Tickets

// Links or report with relevant JIRA tickets.


  • No labels