This page describes ideas, thoughts, and planning around Apache Ignite 3.0 release.


Release Manager

Valentin Kulichenko

Development Process

All the development for the Apache Ignite 3.x happens in the dedicated repository: https://github.com/apache/ignite-3

TeamCity is currently set up to run all available JUnit tests (this will be changed in the future): https://ci.ignite.apache.org/project/ignite3

TeamCity triggers the test run for a pull request when it's created or updated. Upon completion, the PR is updated with the corresponding status which is shown as a successful or failed check.

The basic process to make a change is the following:

  1. A contributor develops in their own fork and creates a PR when done.
  2. TeamCity triggers a test run and updates the PR with the status. The same happens if PR is updated.
  3. A committer reviews the code and merges the PR directly from GitHub. The merge is allowed only if the test check is passed.

Meetups

On the week of Sep 14th, we had two virtual events where the community discussed the proposed changes.

In English:

In Russian:

Slides are available here.

Major Changes and Features

Schema-first Approach

One of the biggest challenges with the current version of Ignite is related to how it works with data schemas. Here are the main issues:

  • Ignite maintains schemas for serialization (a.k.a. "Binary Metadata"), as well as the SQL schemas. These two are independent of each other, which creates a lot of confusion and can cause unpredictable behavior if one of the schemas is updated.
  • At the same time, caches are essentially schemaless – they can store multiple data types, as well as multiple versions of the same type.
  • Schema updates for binary objects happen automatically when new objects are serialized – users do not have any control over this whatsoever. Conversely, SQL schema is updated manually using DDL.
  • Binary object format is used not only for the data stored in caches but for anything that is serialized within the system. This unnecessary complicates (de)serialization process, which has both usability and performance implications.
  • SQL schema can be equally defined using DDL and using query entities within an XML file. The latter can then further be updated using DDL, which can cause unpredictable behavior in the case of node restarts.

To solve this, we should switch to the "schema-first" approach, which means that the schema is defined for a cache/table before its creation. The proposed changes are:

  • Force one-to-one mapping between a schema and a cache. The schema for a cache is defined during its creation and can be dynamically updated going forward.
  • The schema unequivocally describes the data stored in the cache – the data type, list of columns and indexes, etc. Multiple concurrent versions of the schema must be supported, though.
  • Any cache can only have one data type.
  • There should be a unified API for schema updates, which can be used directly, as well as internally for DDL, or within external tools.
  • The current binary format should only be used for data records stored in caches or tables. All other (de)serialization should be switched to a different protocol (probably, the OptimizedMarshaller can be reused for this).
  • BinaryObject  API is replaced with Record API (the naming can be discussed, of course). Here, a record is essentially a binary tuple that represents a set of fields stored in a cache entry or a table row. Similar to current binary objects, records have a "field" method to extract individual fields. A record can also be deserialized to any subset of fields.
  • Schemas are persisted on disk, regardless of whether persistence is enabled or not. Any updates to a schema are persisted as well so that the cluster is restarted with the latest version of this schema.

Dynamic Configuration

Another pain point is the fact that all the configuration in Ignite is static. Almost any updates to the configuration require cluster restart. It is also not obvious which parameters can be updated and which cannot because all of them are located within the same set of beans – there is no separation.

The proposal is to go through all the parameters and create a clear separation between:

  1. Static and dynamic parameters (the latter can be changed in the runtime);
  2. Node-level and cluster-level parameters.

Once this separation is done, we will need to update internal logic to allow changes for the dynamic parameters – such changes must be possible without any node restarts. The set of dynamic parameters can be minimal in the first version, as long as there is a way to convert static parameters to dynamic going forward.

In addition, the current Spring XML format seems to be outdated – it's not even recommended for Spring itself. Not to mention that we have to add the whole Spring dependency solely for configuration conversion. We suggest using the LightBend Config library based on the HOCON format: https://github.com/lightbend/config. This format is designed specifically for configuration purposes, is compatible with JSON, Java Properties, and allows merges. The latter is very useful for dynamic configuration updates.

SQL API

Currently, query API is attached to the cache API. I.e., to execute a query, one needs to create a cache first. This is incompatible with SQL-only use cases where all the tables are created via DDL. In this use case, executing a SQL query requires that the user creates a dummy cache without no data and uses this cache as an API gateway. This is very counterintuitive. Instead, we should have a designated API for SQL, which would be located on the top level (e.g. ignite.sql().executeQuery(..) ).

Modularization

Ignite consists of many modules, most of which are optional and are not used by the majority of our users. Still, we deliver binaries as a single package, which the user has to modify manually by moving folders around. In addition, the package contains configuration files, as well as multiple startup scripts. One of the biggest issues is related to upgrades: if anything within the package is modified (one of the scripts, a configuration file, a set of enabled modules, etc.), these changes then must be somehow merged with the new version of the package. There is no clear standard way of doing this, which makes the whole process very counterintuitive and error-prone.

We need to come with a better way of modularizing the platform and apply it to all available ways of delivery (downloadable ZIP, RPM, DEB, etc.). Can we utilize Java modularization somehow?

Some of the modules (mainly, integration components and thin clients) can be isolated into separate projects with independent lifecycles.

GraalVM Native Image Support

GraalVM is gaining a lot of popularity, especially in the context of serverless environments and frameworks like Micronaut and Quarkus. Currently, even Ignite thin client blocks software from being built into a native GraalVM image – this needs to be fixed. Preferably, we should support this for thick clients as well, although this is a lower priority and might require much bigger effort.

Cleanup

Ignite 3.0 is a major release, which gives us a unique opportunity to make incompatible changes. This can be used to do a cleanup and remove deprecated APIs and features. Additionally, all APIs should be revisited from this point of view – anything that is not relevant anymore should be removed from the project; some APIs can be reworked and modernized.

List of proposed removals (WIP):

  • *Resource annotations except for the IgniteInstanceResource 
  • Job scheduler API
  • Messaging API
  • Gridify/AOP
  • On-heap cache and eviction policies (question)
  • IGFS and Hadoop Accelerator
  • Indexing SPI
  • Communication and discovery SPIs (should be internal components not exposed to the public API)
  • Checkpoint SPI
  • Mesos and Yarn integrations
  • OSGi integration
  • Local caches
  • Scalar
  • Explicit locks (question)
  • GAR file deployment
  • "Force server" mode for client nodes
  • CacheRebalanceMode  and rebalance delay - should always work in the ASYNC  mode (current default)
  • Daemon nodes
  • Custom affinity functions
  • Visor (should be replaced with a unified tool)
  • Daemon mode
  • Force server mode for client nodes
  • Unnecessary rebalance modes (only ASYNC makes sense) and parameters

Other changes

  • Replace IgniteFuture  with Java's CompletableFuture 
  • The consistent ID should be a String, not a generic Serializable object

Issues

Label ignite-3  is used for any issues in JIRA that are related to Ignite 3.x.

key summary type updated assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels

1 Comment

  1. I'm not sure it is OK to blindly remove explicit locks.

    In comparison to tx locks it has one great feature: tryLock, which has no direct counterpart in tx api.