Application Metrics (Perfcounter | Performance Metrics | Operational data | Monitoring )

> Application Metrics (Perfcounter | Performance Metrics | Operational data | Monitoring )

1 - About

This section is about the collection and calculation of metrics used in the context of realtime event:

  • reporting
  • and alerting

known as:

  • monitoring
  • or operational intelligence.

Monitoring system implements operational intelligence that provides a picture of what is currently happening within a system (event view) whereas business intelligence is data gathered for analyzing trends over time (business process view)

The application that collects and analyse this kind of data are known as event-data application. For instance

  • Machine data (IoT)
  • Network telemetry

They are produced by application (OS and third app) in order to:

See also: Observablity

These (counter|numbers) are or we will derive averages from them. There is no way to figure out from the data, if you got from one point to another in an horizontal and constant line.

2 - Type

Primitives Metrics Type are called statistical collector

3 - Characteristics

3.1 - Registry

They are generally locally grouped in a registry in order to batch the data collection.

3.2 - Dimensionality

Time serie data may be classified via:

  • dimension - the event is enriched with tag key/value pairs. (AppOptics, Atlas, Azure Monitor, Cloudwatch, Datadog, Datadog StatsD, Dynatrace, Elastic, Humio, Influx, KairosDB, New Relic, Prometheus, SignalFx, Sysdig StatsD, Telegraf StatsD, Wavefront)
  • hierarchy - the name is a flat hierarchical metric name (Graphite, Ganglia, JMX, Etsy StatsD)
  • or both

dimensions are also known as tags

Hierarchy Example :

  • Atlas (CamelCase)- httpServerRequests
  • Graphite (Point separator)- http.server.requests
  • InfluxDB and Prometheus separated by _ - http_server_requests
Advertising

3.3 - Aggregation Processing

The aggregation of a set of samples over a prescribed time interval (Rate aggregation) may be performed:

  • Client Side (AppOptics, Atlas, Azure Monitor, Datadog, Elastic, Graphite, Ganglia, Humio, Influx, JMX, Kairos, New Relic, all StatsD flavors, SignalFx)
  • or Server-side (Prometheus, Wavefront)

Example: conversion of discrete samples (such as counts) to a rate.

Not all measurements are reported or best viewed as a rate. For example, gauge values are not rates.

3.4 - Metrics Collection

The collection of metrics may be done:

  • client side via client pushes (AppOptics, Atlas, Azure Monitor, Datadog, Elastic, Graphite, Ganglia, Humio, Influx, JMX, Kairos, New Relic, SignalFx, Wavefront)
  • server side via server polls (Prometheus, all StatsD flavors)

4 - Steps / Lifecycle

They are:

Alerting:

Merge with monitoring lifecycle ??

Advertising

5 - Counter Category

Machine data counter example:

5.1 - Sensor

  • temperature,
  • speed,
  • voltage,
  • number of printouts

5.2 - Service Metrics

See SLI: Service Level Indicators:

5.3 - Event

Some monitoring systems can also capture events:

  • Changes: Internal code releases, builds, and build failures
  • Alerts: Internally generated alerts or third-party notifications
  • Scaling events: Adding or subtracting hosts
Advertising

6 - Property

6.1 - Scale and Persistence

  • last 2 hours at 1 minute resolution,
  • last 24 hours at 10 minute resolution,
  • last 3 days with 1 hour resolution,
  • last 7 days at 2 hours resolution

X-scale (Minor/Major Tick)

7 - Reference