Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Panel
titleTable of Content

Table of Contents
 

 

Description

Target as a useful monitoring tool, eagle should provide additional monitoring information other than basic alerting definition. And by the definition, the alerting policy already has the definition of how to aggregate the monitoring stream input. Eagle should provide similar definition of stream aggregation and metric storage for the aggregated information. This page would include the design of 

...

  • How the eagle UI be designed for the monitoring

Case Study

1. Hadoop jmx metric aggregation and plot

...

This is a typical case that eagle could provide its value as distributed aggregation on multiple data source. Such kind of feature are not covered by single source alert (like zabbix configuration on host which lack of multiple source information; ES for centralized log collection/search, but lack of streaming data processing framework; Druid as storage and kind of pre-aggregation, but lack of user defined streaming processing on it. For all lack of multiple stream join operation support).

 

Requirement

1. User could be able to store metrics in customized time window with timeBatch window.

...

4. DSL evaluation should be flexible enough to support build window from history data (for large time window) * - this could be the same feature that could be used in alerting. 

Design

Analytic DSL Definition

Currently alert definition use siddhi dsl as the dialect. Analytic dsl would keep the same user experience. 

...

Code Block
ec.fromKafka[AuditLog]
    .groupBy(_.user)
    .query( """
        from hdfsAuditLogEventStream[(src == '/tmp/private')]#window.externalTime(timestamp,10 min)
        select user, count(timestamp) as aggValue
        group by user
        having aggValue >= 5
        insert into anotherAlertStream;
        """".stripMargin) 
// hdfsAuditLogEventStream -> anotherAlertStream

Partition

Partition is the mainly concern when talk to CEP handling. Basically, an analytic DSL above doesn't incorporate the partition itself.

...

Such kind of map/reduce by framework might have requirement of the analytic behavior must be able to support simple map/reduce. This might require user with more care knowledge on how to write their logic in our DSL.

 

State management

The state management is to store/restore the state of monitoring state during streaming processing. A couple of aspects included

...

  1. Whole topology state would be used to restore when a topology restarted.
  2. Single bolt status restore:
    When a bolt is started, it would try to load from the snapshot store where the snapshot match the current bolt's policy acceptance. 

Exactly once semantic

// TBD

 

Persistence

As a general monitoring tool, eagle not mean to store even point of metric into storage, user have to define a time window to reduce the gratuity of the metrics. This is user written CEP-QL above.

...

Info

{

"name": "hbase-default",

"type": "hbase",

"connectionString" : "",

"props": {...}

}

 

Metric API

Currently, this metric API is left highly coupled with the underlying storage, for hbase metric, use the eagle metric API. For druid, use the druid query API. Ideally, user might use SQL-style query to get the metrics.