Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

IDIEP-84
Author
Sponsor
Created

 

StatusDRAFT

Table of Contents

Motivation

It is important to have a uniform way to provide exceptions or error messages to end users. The goal of this design is to give a draft of exception classes to be used in public API, as well as main expectations from their usages.

Requirements

  1. A public exception must have an error code. Each error code will be documented. This makes it easier to guide a user through troubleshooting.
  2. An internal exception can have an error code.
  3. Nevertheless, it is recommended to use standard Java exceptions in cases where applicable (for public and internal APIs).
  4. In general, unchecked exceptions are preferred on public API. Checked exceptions are allowed on public API in cases where API forces a user to handle such exceptions. E.g. retries. It’s developers’ choice about using checked or unchecked exceptions on internal API.
  5. An error code consists of two parts:
    1. An error group - a text identifier that is unique and specific for some module (vendor, functional domain).
    2. An error identifier - an integer numeric identifier that is unique for a particular error group.
  6. An error code implementation must provide extensibility for modules, vendors, etc. And must not require modification of core modules in order to introduce a new error code.
  7. An exception must also provide a message which specifies the context and conditions under which the error occurred. 
  8. An exception should provide additional information about the error as an exception’s cause.
  9. Under normal conditions, we should avoid transferring stack traces throughout a network. But it must be possible to turn on some kind of debug mode which will lead to transferring stack traces from node to node in order to simplify development and debug.
  10. The important concept is error traceability. It must be possible to track the error on the cluster. It can be achieved by introducing a unique error ID which should be passed from one exception to another and also should be printed in a log. Such an approach simplifies troubleshooting and logs analysis.
  11. While there are some programming languages that do not support exceptions it is a client/extension developers’ responsibility to translate Java exceptions from public API to the language-specific error handling system.

Description

Error groups and error codes

First proposed abstraction is a concept of error groups. It is similar to what was called an ErrorScope in Devlist Discussion.

...

So, numeric error code includes both group code and an internal unique code. These codes should be stored in constants and be documented. Please refer to the code examples for specifics:  IEP-84 Error handling ErrorGroupIEP-84 Error handlingRaftErrors

Exceptions tracing

Transferring a stack trace to thin clients or other server nodes is not always necessary. Not only it pollutes logs, but also creates pressure on the network, or does some other bad stuff. There might be many different reasons.

...

Originating node id could also help locating the problem, if it’s not a regular “column already exists” exception. For those, trace id and originating node id are optional.

Exception classes

Basically, we have to have exception classes with error code information, let’s call them IgniteException and IgniteCheckedException. These classes, by themselves or via subclasses, should be thrown to users in the public API. See IEP-84 Error handling IgniteException as a draft of the final implementation.

...

Examples of how specific exceptions classes could be integrated with described model could be found in this section: Specific Exceptions

Exceptions serialization

Since there are other languages than Java, we should have a generic way to convert exceptions between different representations. Current errors serialization design is described here: IEP-76 Thin Client Protocol for Ignite 3.0

...

  • What should we do with standard Java exceptions, like TimeoutException or IllegalArgumentException, or even NPE? Right now it's better to have a reserved error group for them and assign a specific codes to all "known" types.

Guidelines and restrictions

As you see, all error groups are only added at runtime. There’s no compile-time validation that there are no collisions. This comes with a set of problems:

  • Late collision detection - we should rely on tests to find them. Such checks could only be performed when a full set of error groups is registered, we have integrational tests for this.
  • Difficulties in maintaining collision-free lists of errors between releases. Let’s say that the developer creates a patch for version 3.0.x with new error code “IGN-ABC-123”. There’s no way to avoid collision with introducing the same code for another error in version 3.1.x (for example). This could only be resolved by a good set of compatibility tests (which is still hard for not yet released master versions) or by maintaining a golden standard list of errors somewhere independently from the source code, as it was done for IgniteFeatures class in Ignite 2.x.

Implementation draft

Anchor
ErrorGroup
ErrorGroup
ErrorGroup

Code Block
languagejava
public class ErrorGroup {
    // Private constructor protects from arbitrary group creation.
    private ErrorGroup(int code, String name) {
        // ...
    }

    public int code() {
        // ...
    }

    // I’d suggest forcing the regex check, something like “^[A-Z0-9]{3,7}$”
    public String name() {
        // ...
    }

    public int makePublicCode(int code) {
        // Check code range.
        return (code() << 16) | (code & 0xFFFF);
    }

    public static synchronized ErrorGroup newGroup(int code, String name) {
        // Range check for the code.
        // Regex check for the name.
        // Uniqueness check for both name and code.
        return new ErrorGroup(code, name);
    }
}

Anchor
RaftErrors
RaftErrors
RaftErrors

Code Block
languagejava
// Usage example:
public class RaftErrors {
    // This is the error group for the RAFT.
    public static final ErrorGroup RAFT_ERR_GROUP = ErrorGroup.newGroup(10, “RFT”);

    public static final ErrorGroup OTHER_ERR_GROUP = ErrorGroup.newGroup(11, “RFT”);


    // These are public constants for users to check in their catch blocks.
    public static final int SPLIT_BRAIN_ERR = RAFT_ERR_GROUP.makePublicCode(1);

    public static final int TIMEOUT_ERR = RAFT_ERR_GROUP.makePublicCode(2);

    public static final int TX_ERR = RAFT_ERR_GROUP.makePublicCode(3);
}

Anchor
IgniteException
IgniteException
IgniteException

Code Block
languagejava
// This is a draft for public runtime exceptions implementation.
public class IgniteException extends RuntimeException {
    private final ErrorGroup group;
    private final int publicCode;

    // Trace id is a unique exception identifier that should help locating
    // the error message in logs.
    private final UUID traceId;

    // This constructor is only an example. Of course, there will be a
    // variety of constructors for different cases - with or without a
    // cause, different trace id generation strategies, etc.
    public /* ? */ IgniteException(
        ErrorGroup group, int code, String message, UUID traceId
    ) {
        super(makeMessage(group, code, message, traceId));

        // Check that error group from the code matches passed group.

        this.group = group;
        this.publicCode = code;
        this.traceId = traceId;
    }

    // Accessor that’s used by the end user. Returns constant, previously
    // generated by “makePublicCode”.
    public int errorCode() {
        return code;
    }

    public UUID traceId() {
        return traceId;
    }

    private static String makeMessage(
        ErrorGroup group, int code, String message, UUID traceId
    ) {
        return “IGN-” + group.name()
            + “-” + (code & 0xFFFF) + “: “ + message
            + “. Trace id: “ + traceId;
    }

    // This method might be useful, but I can’t think of any specific
    // usages right now.
    public String humanReadableCode() {
        return “IGN-” + group.name() + “-” + (publicCode & 0xFFFF);
    }
}

Anchor
SpecificExceptions
SpecificExceptions
Specific Exceptions

Code Block
languagejava
public class SqlCheckedException extends IgniteCheckedException {
    // Constructor for specific types of exceptions should not specify
    // error group, because it’s always the same.
    public SqlException(<params>) {
        super(SqlErrors.SQL_ERR_GROUP, <params>);
    }
}

public class SqlTxRollbackCheckedException extends SqlCheckedException {
    // Some exception types might specify an error code and differ
    // be a message only. This way we could still have a flexible hierarchy
    // that makes sense from an error code standpoint.
    public SqlTxRollbackCheckedException(<params>) {
        super(SqlErrors.SQL_ERR_TX_ROLLBACK, <params>);
    }
}

public class IgniteInternalCodedException extends IgniteInternalException {
    ...
}

Open Tickets

Jira
serverASF JIRA
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,customfield_12311032,customfield_12311037,customfield_12311022,customfield_12311027,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,Priority,Priority,Priority,Priority,priority,status,resolution
maximumIssues20
jqlQueryproject = IGNITE AND (labels = iep-84) AND status NOT IN (resolved, closed)
serverId5aa69414-a9e9-3523-82ec-879b028fb15b

Closed Tickets

Jira
serverASF JIRA
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,customfield_12311032,customfield_12311037,customfield_12311022,customfield_12311027,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,Priority,Priority,Priority,Priority,priority,status,resolution
maximumIssues20
jqlQueryproject = IGNITE AND (labels = iep-84) AND status IN (resolved, closed)
serverId5aa69414-a9e9-3523-82ec-879b028fb15b