Alerting Gateway Architecture

Architecture

High Level Overview

Alerting Gateway

The gateway-side Alerting server, henceforth called Alerting plugin is responsible for managing & applying the following user configurations:

endpoints, specifications used by the Alerting Cluster to dispatch alerts
conditions, specifications applied to datasources to evaluate observability data

The alerting plugin is also responsible for connecting to datasources:

Aggregate available info from datasource into templates for the client
Manage dependencies on datasources that make the observations required to evaluate conditions

The alerting plugin exposes an API to dynamically install and scale the Alerting Cluster, that delegates necessary updates to the controller through a cluster driver

In short, the alerting gateway plugin process can be broken down into:

Initialization Phase : setting up required dependencies by the core alerting gateway plugin to run
Backend Components : Servers that handle the logic and requests behind opni alerting features

Initialization Phase

The initialization phase of the alerting plugin is responsible for setting the up the correct adapters to:

persistent storage
the Alerting Cluster, managed by the Alerting Controller
external datasources : opni backends & internal gateway state

In more detail, when the alerting plugin initializes it must:

Set up cluster driver, an adapter to the alerting controller
Wrap the cluster driver in an API (OpsServer)
Acquire the storage api clients
Setup datasources:

4.1. Acquire gateway internal streams, and setup persistent streams to watch the data streamed
4.2. Acquire metrics ops backend client, and scrape the status api and send it to a persistent stream
4.3. Acquire metrics admin client, to CRUD cortex rule objects

Reindex (re-apply) user configurations, if the external datasource dependencies aren't loaded. For details on how external datasource dependencies are loaded see Datasource Implementations

Components

Alerting Backend

Datasource Implementations

There are currently 2 datasources for Opni Alerting

Internal : system critical information exposed by the gateway
metrics : information exposed in metrics format by the Opni Metrics backend

The currently supported opni conditions map to:

Agent disconnect -> internal datasource
Capability unhealthy -> internal datasource
Monitoring backend -> internal datasource
Prometheus Query -> metrics datasource
Kube State -> metrics datasource

Internal

Alerting conditions backed by internal datasources are evaluated using custom internal evaluator objects. These conditions are not evaluated using metrics, because we must have a way to observe the opni system with as little assumptions as possible.

Each internal datasource sets up a persistent stream that scrapes information on a stream/unary API exposed by the gateway.

The information on these persistent streams is backed by a durable consumer with a small buffer to replay information.

internal evaluator

Internal data APIs are read and then grouped by a given key, typically this key represents a unique cluster id. This is done for scaling internal conditions across multiple downstream clusters.
For internal evaluators, all changes to them are propagated through an evaluation context. The values in the evaluation context are responsible for:
- Telling the subscriber where to read its messages from AND how to convert those messages to a format the inner evaluator understands.
- Telling the evaluator when to trigger/resolve alerts based on messages received from the subscriber over time.

Metrics

Alerting conditions backed by metrics datasource are evaluated using prometheus rule objects.

More specifically, opni alerting manages these dependent objects via a CRUD rules API exposed by the metrics admin API.

Multiple rule objects are CRUDed (and organized by groups) depending on the template of the metrics-backed conditions

Architecture

Backends
Core Components
- Opni Gateway
- Opni Agent

How Tos

Releases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alerting Gateway Architecture

Table of Contents

Architecture

High Level Overview