IDIEP-13
Author

Anton Vinogradov

Aleksey Plekhanov

SponsorAleksey Plekhanov
Created24 Jan 2017
StatusDRAFT


Motivation

For Ignite diagnostic usually it’s helpful to get some Ignite internals information. But currently, in my opinion, there are no convenient tools for this purpose:

  • Some issues can be solved by analyzing log files. Log files are useful for dumps, but sometimes they are difficult to read. Also interesting metrics can’t be received runtime by request, we need to wait until Ignite will write these metrics by timeout or other events.
  • JMX is useful for scalar metrics. Complex and table data can also be received, but it’s difficult to read, filter and sort them without processing by specialized external tools. For most frequently used cases almost duplicating metrics are created to show data in an easy-to-read form.
  • Web-console is able to show table and complex data. Perhaps, someday  web-console will contain all necessary dashboards for most problem investigation, but some non-trivial queries will not be covered anyway. Also web-console needs additional infrastructure to work.
  • External “home-made” tools can be used for non-trivial cases. They cover highly specialized cases and usually can’t be used as general purpose tools.

Sometimes we are forced to use more than one tool and join data by hands (for example, current thread dump and data from logs).

Often RDBMS for diagnostic purposes provides system views (for example, DBA_% and V$% in Oracle), which can be queried by SQL. This solution makes all internal diagnostic information available in a readable form (with all possible filters and projections) without using any other internal or external tools. My proposal is to create similar system views in Ignite.

Description

Aleksey Plekhanov implemented working prototype (PR:[1]). It contains views:

IGNITE_SYSTEM_VIEWS

Registered system views

IGNITE_INSTANCE

Ignite instance

IGNITE_JVM_THREADS

JVM threads

IGNITE_JVM_RUNTIME

JVM runtime

IGNITE_JVM_OS

JVM operating system

IGNITE_CACHES

Ignite caches

IGNITE_CACHE_CLUSTER_METRICS

Ignite cache cluster metrics

IGNITE_CACHE_NODE_METRICS

Ignite cache node metrics

IGNITE_CACHE_GROUPS

Cache groups

IGNITE_NODES

Nodes in topology

IGNITE_NODE_HOSTS

Node hosts

IGNITE_NODE_ADDRESSES

Node addresses

IGNITE_NODE_ATTRIBUTES

Node attributes

IGNITE_NODE_METRICS

Node metrics

IGNITE_TRANSACTIONS

Active transactions

IGNITE_TRANSACTION_ENTRIES

Cache entries used by transaction

IGNITE_TASKS

Active tasks

IGNITE_PART_ASSIGNMENT

Partition assignment map

IGNITE_PART_ALLOCATION

Partition allocation map

 

There are much more useful views can be implemented (executors diagnostic, SPIs diagnostic).

Some usage examples:

Cache groups and their partitions, which used by transaction more than 5 minutes long:

SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS ENTITIES_CNT
FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t
JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID = te.XID
JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME = c.NAME
JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID = cg.ID
WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW())
GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION

 

Average CPU load on server nodes grouped by operating system:

SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD
FROM INFORMATION_SCHEMA.IGNITE_NODES n
JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID = n.ID AND na.NAME = 'os.name'
JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID = n.ID
WHERE n.IS_CLIENT = false
GROUP BY na.VALUE

 

Top 5 nodes by puts to cache ‘cache’:

SELECT cm.NODE_ID, cm.CACHE_PUTS FROM INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm
WHERE cm.CACHE_NAME = 'cache'
ORDER BY cm.CACHE_PUTS DESC
LIMIT 5

Risks and Assumptions

// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-diagnostic-SQL-system-views-td26349.html

Reference Links

// Links to various reference documents, if applicable.

Tickets

key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels