Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Nikolay Izhikov

IDIEP-35
Author
Sponsor
Created
Status

Status

colour

title

Grey
titleDRAFT
IN PROGRESS
 - Phase 1,2 implemented.


Table of Contents

Motivation

For now, Ignite has not full, fragmented monitoring API. Those APIs uses use different protocols, such as - JMX, Java API, SQL System views, text logs, etc.

...

1. We should add some entities in Ignite:

  1. MetricDomain MetricRegistry - Ignite subsystem that provides some set of sensors and lists.
    1. Cache,
    2. Compute,
    3. ServiceGrid,
    4. etc.
  2. Sensor Metric - some named number with a well-defined algorithm to calculate the value in any given moments in time. 

    Code Block
    languagejava
    class SensorMetric {
        String name; //EntryCount, MemoryAvailable, etc
        long value; //or double
        Collection<Tuple2<String, String>> labels; //hostName, cacheName, etc.
    }
    
    class TimeSensorLongMetric extends SensorMetric {
    	long ts; //timestamp of the last value update.
    }


  3. List SystemView - some named list of string that contains info about Ignite objects. Examples: List of caches, Transaction Transactions list, List of nodes, List of running queries, Las N queries, etc...
  4. MonitoringEvent - generated when some user-defined code violates the threshold.

    Code Block
    languagejava
    class MonitoringEvent {
        MonitoringEventType type; //Event type.
    	T info; //Event info. Type of info differs for different type of events.
    }


2. SensorProcessorGridMetricManager, MonitoringEventProcessorGridSystemViewManager

  1. SensorProcessor GridMetricManager - should be able to store and query Ignite sensorsmetrics.
  2. MonitoringEventProcessor GridSystemViewManager - should be able to set up event listeners, watch for user code executions and route eventsstore and export SystemViews.

3. ExposersExporters:

Specific admin interfaces will be supported through exposersexporters.
Exposer Exporters should work only with SensorProcessor with a read-only version of GridMetricManager and don't rely on other knowledge about Ignite internals.

...

Example of exporters:

  1. JMX
  2. HTTP
  3. SQL System View
  4. JavaLog
  5. etc.
  6. PushExposer - this type of exposers should export sensors and list to some external system based on the configured schedule.
    1. LogExposer
    2. Integration with proprietary monitoring system can be implemented as PushExposer.

...

Lists of Ignite objects/entities that should be listed in Phase

...

2

  1. A list of compute Compute tasks:
    1. Closures
    2. Map-reduce jobs
    3. ComputeJob
    4. Scheduled tasks
  2. Service grid:
    1. Services A list of services with deployment status
  3. Caches
  4. Cache groups
  5. Cluster nodes
  6. SQL objects
    1. Schemas
    2. Tables
    3. Views
    4. Tables columns
    5. Views columns
    6. Indexes
  7. Queries:
    1. SQL
    2. Scan
    3. Text
    4. ContinousQuery
  8. IgniteCache#invoke
  9. put, get, remove, replace, clear operations
  10. Transactions with lock list
  11. DataStreamers
  12. Explicit locks(IgniteCache#lock)
  13. DataStructures
    1. Queue
    2. Set
    3. AtomicLong
    4. AtomicReference
    5. CountDownLatch
    6. Sequence
    7. Semaphore
  14. Message topics (IgniteMessaging)
  15. Thin client connections.
  16. Machine Learning - ???

...

We should consider implementing this IEP as Ignite 3.

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/IEP-35-Monitoring-amp-Profiling-Proof-of-concept-td41904.html

http://apache-ignite-developers.2346864.n4.nabble.com/IEP-35-Monitoring-amp-Profiling-Current-API-Analysis-td41823.html

http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-IEP-35-Metrics-configuration-td42478.html

http://apache-ignite-developers.2346864.n4.nabble.com/IEP-35-GridJobProcessorMetrics-migration-td42415.html#a42441

http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-IEP-35-Replace-RunningQueryManager-with-GridSystemViewManager-td43794.html// Links to discussions on the devlist, if applicable.

Gap analysis

Current monitoring APIs availability:

...

  1. Cache
    1. PDS + offheap memory
      1. Ignite#dataRegionMetrics
      2. Ignite#dataStorageMetrics
      3. Ignite#persistentStoreMetrics
    2. Queries
      1. IgniteCache#queryMetrics
      2. IgniteCache#queryDetailMetrics
      3. QueryHistoryMetrics
    3. IgniteCache#mxBean
    4. IgniteCache#localMxBean
  2. SQL
    1. LOCAL_SQL_RUNNING_QUERIES
    2. INDEXES
  3. Transactions
    1. JMX - TransactionMetricsMxBean
    2. JMX - TransactionMXBean
  4. ThinClients
    1. JMX - ClientProcessorMXBean
  5. IoStaticsticsManager, IoStatisticsHolder
  6. GridJobMetricsProcessor
  7. IgniteMBeansManager
  8. IgniteSpiManagementMBean

Design Principles

  1. Sensors should contain only raw values. No aggregation of numeric metrics on Ignite side.
    Min, max, avg and other functions are the matter of external monitoring system.
  2. Every user task should have an ID or name provided by a user on start time that allows association between monitoring info and user code.
    User should be able to find his code reflected in monitoring.
  3. Every user task should have an ID or name of "connectionID"("sessionID", "clientID") or similar.
    User should be able to know that a specific task was triggered by the specific connection(session, client).
  4. No computation to get current values. We should change sensors and lists values when specific events occur.
    When some sensor queries we should only get its value from internal storage. No additional computation involved.
  5. User should be able to enable/disable any Sensor group/List at runtime. Ignite should provide some administrator interface(s) to enable/disable each Sensor Group or List separately.
    No performance penalty for disabled sensors, lists.

...

https://github.com/darold/pgbadger

Tickets

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
maximumIssues20
jqlQuerylabels = IEP-35
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
// Links or report with relevant JIRA tickets.