Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Enable profiling mode.
  • Executes some arbitrary workload.
  • Collects profiling info.
  • Run the tool that will create the Report contains statistics of workload.

Proposed Changes

The Ignite will provide public facade to manage profiling mode:

ignite.profiling().enable(); // Turns on profiling mode.
ignite.profiling().disable(); // Turns off profiling mode.

ignite.profiling().isEnabled(); // Is profiling mode turn on?

Profiling mode can be managed from CLI (And JMX):

control.sh --profiling  // Prints current profiling mode status.
control.sh --profiling enable // Turns on profiling mode.
control.sh --profiling disable // Turns off profiling mode.

The Ignite will provide the public SPI interface (ProfilingSpi) to log statistics. It can be configured via IgniteConfiguration. It describes follow methods:

  • startProfiling(); // Starts profiling.
  • stopProfiling(); // Stops profiling.
  • log(String info); // Logs operation statistics.

The internal processor (ProfilingProcessor) will be used to manage profiling whole cluster. It will be availible from KernalContext.

The new ignite-profiling module will contain:

  • Default Implementation (LogProfilingSpiImpl) based on async logging to the configured file.
  • The script to collect logs from nodes and build the report: report.sh(bat)

Performance report

The performance report will be in a human-readable text format (and then in the html page) format and should contain:

  • Ignite and plugins versions, topology changes, profiling start/end time
  • Queries (SQL, scan, ..) timings, resources:
    • Queries that took up the most time
    • Slowest queries
    • Most frequent queries
    • Failing queries
    • Queries count by type
    • Queries that took up the most CPU/IO/Disk
    • Failing queries
  • User tasks statistics User tasks timings, resources (similar to queries)
  • Cache operations statistics (similar to queries):
    • Get
    • Put
    • Remove
    • RemoveAndGet
    • PutAndGet
    • Invoke
    • Lock
    • create/destroy caches
  • Workload by nodes
  • Transactions commit/rollback timings
  • Checkpoints statistics
  • PME statistics

Also, statistics will be aggregated per nodes.

Additional investigation required to gather following statistics:

  • Query parse time
  • Lock waiting time
  • User time
  • Messages process timings

This statistics will provide:

  • Top query/operations by CPU time
  • Top query/operations by IO time
  • What operations use most resources?

Phase 1

On the first phase will be implemented:

  • Profiling public API and default implementation
  • Java API, CLI, JMX process management
  • Gathering overall and time statistics of queries, tasks, cache operations, checkpoints and PME's.
  • Tool to create the report

Phase 2

On the second phase will be investigated and implemented:

Proposed Changes

The Ignite will log some additional internal statistics using separate log category of IgniteLogger.

The new ignite-profiling module will be introduced. It will contain:

  • The script to collect logs from nodes
  • The tool to build the report: report.sh(bat)
  • Gathering CPU time per operation
  • Gathering I/O wait time, read/write counting
  • Lock time per operation
  • Display of these statistics in the report


Public API changes

The new interface will be added: ProfilingSpi.

The new ignite facade will be added: ignite.profiling().

The new module will be created: ignite-profiling.

Corner cases

Node left during profiling

Node left will not affect to the cluster profiling mode.

Node join during profiling

Joining node will set up profiling mode from DiscoveryDataBag provided by the cluster.

Risks and Assumptions

Enabled profiling mode will cause performance degradation.

...