...
ID | IEP-84 |
Author | |
Sponsor | |
Created | |
Status | DRAFT |
Motivation
It is important to have a uniform way to provide exceptions or error messages to end users. The goal of this design is to give a draft of exception classes to be used in public API, as well as main expectations from their usages.
Requirements
- A public exception must have an error code. Each error code will be documented. This makes it easier to guide a user through troubleshooting.
- An internal exception can have an error code.
- Nevertheless, it is recommended to use standard Java exceptions in cases where applicable (for public and internal APIs).
- In general, unchecked exceptions are preferred on public API. Checked exceptions are allowed on public API in cases where API forces a user to handle such exceptions. E.g. retries. It’s developers’ choice about using checked or unchecked exceptions on internal API.
- An error code consists of two parts:
- An error group - a text identifier that is unique and specific for some module (vendor, functional domain).
- An error identifier - an integer numeric identifier that is unique for a particular error group.
- An error code implementation must provide extensibility for modules, vendors, etc. And must not require modification of core modules in order to introduce a new error code.
- An exception must also provide a message which specifies the context and conditions under which the error occurred.
- An exception should provide additional information about the error as an exception’s cause.
- Under normal conditions, we should avoid transferring stack traces throughout a network. But it must be possible to turn on some kind of debug mode which will lead to transferring stack traces from node to node in order to simplify development and debug.
- The important concept is error traceability. It must be possible to track the error on the cluster. It can be achieved by introducing a unique error ID which should be passed from one exception to another and also should be printed in a log. Such an approach simplifies troubleshooting and logs analysis.
- While there are some programming languages that do not support exceptions it is a client/extension developers’ responsibility to translate Java exceptions from public API to the language-specific error handling system.
Description
Error groups and error codes
First proposed abstraction is a concept of error groups. It is similar to what was called an ErrorScope in Devlist Discussion.
...
So, numeric error code includes both group code and an internal unique code. These codes should be stored in constants and be documented. Please refer to the code examples for specifics: IEP-84 Error handling ErrorGroup, IEP-84 Error handlingRaftErrors
Exceptions tracing
Transferring a stack trace to thin clients or other server nodes is not always necessary. Not only it pollutes logs, but also creates pressure on the network, or does some other bad stuff. There might be many different reasons.
...
Originating node id could also help locating the problem, if it’s not a regular “column already exists” exception. For those, trace id and originating node id are optional.
Exception classes
Basically, we have to have exception classes with error code information, let’s call them IgniteException and IgniteCheckedException. These classes, by themselves or via subclasses, should be thrown to users in the public API. See IEP-84 Error handling IgniteException as a draft of the final implementation.
...
Examples of how specific exceptions classes could be integrated with described model could be found in this section: Specific Exceptions
Exceptions serialization
Since there are other languages than Java, we should have a generic way to convert exceptions between different representations. Current errors serialization design is described here: IEP-76 Thin Client Protocol for Ignite 3.0
...
- What should we do with standard Java exceptions, like TimeoutException or IllegalArgumentException, or even NPE? Right now it's better to have a reserved error group for them and assign a specific codes to all "known" types.
Guidelines and restrictions
As you see, all error groups are only added at runtime. There’s no compile-time validation that there are no collisions. This comes with a set of problems:
- Late collision detection - we should rely on tests to find them. Such checks could only be performed when a full set of error groups is registered, we have integrational tests for this.
- Difficulties in maintaining collision-free lists of errors between releases. Let’s say that the developer creates a patch for version 3.0.x with new error code “IGN-ABC-123”. There’s no way to avoid collision with introducing the same code for another error in version 3.1.x (for example). This could only be resolved by a good set of compatibility tests (which is still hard for not yet released master versions) or by maintaining a golden standard list of errors somewhere independently from the source code, as it was done for IgniteFeatures class in Ignite 2.x.
Implementation draft
ErrorGroup
Code Block |
---|
|
public class ErrorGroup {
// Private constructor protects from arbitrary group creation.
private ErrorGroup(int code, String name) {
// ...
}
public int code() {
// ...
}
// I’d suggest forcing the regex check, something like “^[A-Z0-9]{3,7}$”
public String name() {
// ...
}
public int makePublicCode(int code) {
// Check code range.
return (code() << 16) | (code & 0xFFFF);
}
public static synchronized ErrorGroup newGroup(int code, String name) {
// Range check for the code.
// Regex check for the name.
// Uniqueness check for both name and code.
return new ErrorGroup(code, name);
}
} |
RaftErrors
Code Block |
---|
|
// Usage example:
public class RaftErrors {
// This is the error group for the RAFT.
public static final ErrorGroup RAFT_ERR_GROUP = ErrorGroup.newGroup(10, “RFT”);
public static final ErrorGroup OTHER_ERR_GROUP = ErrorGroup.newGroup(11, “RFT”);
// These are public constants for users to check in their catch blocks.
public static final int SPLIT_BRAIN_ERR = RAFT_ERR_GROUP.makePublicCode(1);
public static final int TIMEOUT_ERR = RAFT_ERR_GROUP.makePublicCode(2);
public static final int TX_ERR = RAFT_ERR_GROUP.makePublicCode(3);
} |
Anchor |
---|
| IgniteException |
---|
| IgniteException |
---|
|
IgniteException
Code Block |
---|
|
// This is a draft for public runtime exceptions implementation.
public class IgniteException extends RuntimeException {
private final ErrorGroup group;
private final int publicCode;
// Trace id is a unique exception identifier that should help locating
// the error message in logs.
private final UUID traceId;
// This constructor is only an example. Of course, there will be a
// variety of constructors for different cases - with or without a
// cause, different trace id generation strategies, etc.
public /* ? */ IgniteException(
ErrorGroup group, int code, String message, UUID traceId
) {
super(makeMessage(group, code, message, traceId));
// Check that error group from the code matches passed group.
this.group = group;
this.publicCode = code;
this.traceId = traceId;
}
// Accessor that’s used by the end user. Returns constant, previously
// generated by “makePublicCode”.
public int errorCode() {
return code;
}
public UUID traceId() {
return traceId;
}
private static String makeMessage(
ErrorGroup group, int code, String message, UUID traceId
) {
return “IGN-” + group.name()
+ “-” + (code & 0xFFFF) + “: “ + message
+ “. Trace id: “ + traceId;
}
// This method might be useful, but I can’t think of any specific
// usages right now.
public String humanReadableCode() {
return “IGN-” + group.name() + “-” + (publicCode & 0xFFFF);
}
} |
Anchor |
---|
| SpecificExceptions |
---|
| SpecificExceptions |
---|
|
Specific Exceptions
Code Block |
---|
|
public class SqlCheckedException extends IgniteCheckedException {
// Constructor for specific types of exceptions should not specify
// error group, because it’s always the same.
public SqlException(<params>) {
super(SqlErrors.SQL_ERR_GROUP, <params>);
}
}
public class SqlTxRollbackCheckedException extends SqlCheckedException {
// Some exception types might specify an error code and differ
// be a message only. This way we could still have a flexible hierarchy
// that makes sense from an error code standpoint.
public SqlTxRollbackCheckedException(<params>) {
super(SqlErrors.SQL_ERR_TX_ROLLBACK, <params>);
}
}
public class IgniteInternalCodedException extends IgniteInternalException {
...
} |
Open Tickets
Jira |
---|
server | ASF JIRA |
---|
columnIds | issuekey,summary,issuetype,created,updated,duedate,assignee,reporter,customfield_12311032,customfield_12311037,customfield_12311022,customfield_12311027,priority,status,resolution |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,Priority,Priority,Priority,Priority,priority,status,resolution |
---|
maximumIssues | 20 |
---|
jqlQuery | project = IGNITE AND (labels = iep-84) AND status NOT IN (resolved, closed) |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
|
Closed Tickets
Jira |
---|
server | ASF JIRA |
---|
columnIds | issuekey,summary,issuetype,created,updated,duedate,assignee,reporter,customfield_12311032,customfield_12311037,customfield_12311022,customfield_12311027,priority,status,resolution |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,Priority,Priority,Priority,Priority,priority,status,resolution |
---|
maximumIssues | 20 |
---|
jqlQuery | project = IGNITE AND (labels = iep-84) AND status IN (resolved, closed) |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
|