You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 27 Current »

Motivation

For now, Ignite has not build-in profiling tool for user's operations and internal processes. Such a tool will be able to collect performance statistics and create a human-readable report. It will help to analyze workload and to tune configuration and applications.

Example of similar tools in other products: AWR [1] [2] [3] (Oracle) ; pgbadger [4], pgmetrics [5], powa [6] (PostgresSQL).

Description

We should provide a way to execute cluster profiling. Consider the following scenario:

  • Enable profiling mode.
  • Executes some arbitrary workload.
  • Collects profiling info.
  • Run the tool that will create the Report contains statistics of workload.

Performance report

The performance report will be in a human-readable format (html page) and should contain:

  • Ignite and plugins versions, topology changes, profiling start/end time
  • Queries (SQL, scan, ..) timings, resources:
    • Queries that took up the most time
    • Slowest queries
    • Most frequent queries
    • Failing queries
    • Queries count by type
    • Queries that took up the most CPU/IO/Disk
  • User tasks timings, resources
    • Jobs of slowest tasks
  • Caches and cache operations statistics:
    • Get/Put/Remove
    • Transactions
    • Invoke
    • Lock
    • create/destroy caches
  • Workload by nodes
    • CPU/IO/Disk resources
  • Checkpoints statistics
  • WAL statistics
  • PME statistics

Additional investigation required to gather following statistics:

  • Query parse time
  • Lock waiting time
  • User time
  • Messages process timings

Proposed Changes

The Ignite will log some additional internal statistics to profiling files (IgniteProfiling interface). The format is like WAL logging.

One disk-writer thread and off-heap memory buffer will be used to minimize affect on performance. Maximum file size and buffer size can be configured on start.

The new ignite-profiling module will be introduced. It will contain the tool to build the report: profiling.sh(bat). The JSON format is used to store aggregated statistics and next draw in the report.

The report is based on the bootstrap library and can be viewed in a browser.

Profiling management

1) JMX:

  • void startProfiling() // Start profiling in the cluster.
  • void startProfiling(long maxFileSize, int bufferSize, int flushBatchSize) // Start profiling in the cluster with custom parameters.
  • void stopProfiling() // Stop profiling in the cluster.
  • boolean profilingEnabled() // True if profiling enabled.

2) Control.sh utility. Functionality is like JMX.

Public API changes

The new module will be created: ignite-profiling.

The script to build the report: profiling.sh(bat)

The JMX bean to manage cluster profiling.

Risks and Assumptions

Enabled profiling mode will cause performance degradation.

Discussion Links

Dev-list discussion.

Report example





Reference Links


  1. https://docs.oracle.com/cd/E11882_01/server.112/e41573/autostat.htm#PFGRF94176
  2. http://www.dba-oracle.com/t_sample_awr_report.htm
  3. http://expertoracle.com/2018/02/06/performance-tuning-basics-15-awr-report-analysis/
  4. https://github.com/darold/pgbadger
  5. https://pgmetrics.io/docs/index.html#example
  6. https://powa.readthedocs.io/en/latest/

Tickets

Unable to render Jira issues macro, execution error.






  • No labels