IDIEP-27
Author Iurii Gerzhedovich
Sponsor Vladimir Ozerov Ozerov
Created 26 Sep 2018
StatusDRAFT


Motivation

IO statistics is used by all database vendors to have ability monitoring and tuning of performance a queries and found bottle neck related to IO operations.

The proposed solution is to implement gathering IO statistics for logical (memory) read pages and physical (storage) read/write pages. The statistics can be used at least for :

  1. Analyze IO performance of a query.
  2. Have ability to understand why performance goes down in case it related to IO. For example on concrete node or cache.
  3. Evaluate effectiveness of use indexes. Find unused indexes.
  4. See most IO aggressive queries.

Competitive analysis

Industrial  RDBMS vendors (Oracle, Postgres, MySQL)  collect detailed IO statistics in dimensions such as queries, database objects (tables and indexes), files, sessions, users, event types etc. 

  • Oracle

    Have information about IO operations on file/segment level, IO wait time on query level, Logical and Physical reads. Historical IO statistics.

    • v$filestatThis view displays the number of physical reads and writes done and the total number of single-block and multiblock I/Os done at file level

    • v$segment_statistics displays information about segment-level statistics.

    • v$sqlarea -  lists statistics on shared SQL area and contains one row per SQL string. It provides statistics on SQL statements that are in memory, parsed, and ready for execution.

    • v$sysstat - displays system statistics. 

    • v$sesstat displays user session statistics. 

    • AWR - historical statistics.

    • Logical I/O and Physical I/O in Oracle

  • PostgreSQL

    Have information about physical and logical  read pages for indexes and tables separately and for whole database. With the same granularity has time spent for physical operations. Also statistics available on SQL statements level.
    • pg_stat_database - Contain one row for each database in the cluster, showing database-wide statistics.

    • pg_statio_all_tables - Contain one row for each table in the current database (including TOAST tables), showing statistics about I/O on that specific table.

    • pg_statio_all_indexes Contain one row for each index in the current database, showing statistics about I/O on that specific index.

    • pg_stat_all_indexes - Contain one row for each index in the current database, showing statistics about accesses to that specific index. 

    • pg_stat_all_tables - Contain one row for each table in the current database, showing statistics about accesses to that specific table.

    • pg_stat_statements -  Tracking execution statistics of all SQL statements executed by a server.


  • MySQL

    Have wait time for read and write operations splitted by table/indexes. Number and count of bytes of read/write operations on file level and globally. IO operations and latency grouped by  initialized host/user.

Proposed Changes


  1. Implement gathering local node statistics with two levels of detalization: fine-grain by type of page types (about 35 such types) and aggregated statistics: INDEX, DATA, OTHER. Need to support 3 types of IO operations: logical/physical read and physical write (only as global statistics).
  2. Implement collecting local node statistics with the following dimensions: cache, index, query.
  3. implement exchange IO statistics by nodes to have ability expose cluster statistics (e.g. IO statistics for distributed SQL). 
  4. expose IO statistics interfaces for SQL, JMX, Visor, CMD, Web Console

Reference Links

// Links to various reference documents, if applicable.

TBD

  • Cluster wide statistics, can lead performance issue
  • which of interfaces should be exposed: SQL, JMX, Visor, CMD, Web Console ....
  • reset of statistics. It can be by request, by time, some metrics value, .... or may be reset by time and keep history of metrics.
  • .....

Tickets

key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels