Monitoring

Services Monitoring Workflow¶

Services Monitoring Workflow consists of three main components:

1	`Collection agent, Metrics Server, and Dashboards.`

Typical workflow, including most common components:

Monitoring agent collects node metrics.
cAdvisor collects containers and pods metrics.
Monitoring Aggregation service collects data from its own agent and cAdvisor.
Data is stored in the monitoring system’s storage.
Monitoring aggregation service exposes metrics through APIs and dashboards.

A Few Notes:

Prometheus is the official monitoring server sponsored and incubated by CNCF. It integrates directly with cAdvisor. You don’t need to install a 3^rd party agent to retrieve additional metrics about your containers. However, if you need deeper insights about each node, you need to install an agent of your choice — see Prometheus integrations and third-party exporters page.
Almost all monitoring systems piggyback on Kubernetes scheduling and orchestration. For example, their agents are installed as DeomonSets and depend on Kubernetes scheduler to have an instance scheduled on each node.
Most monitoring agents depend on Kubelet to collect container relevant metrics, which in turn depends on cAdvisor. Very few agents collect container relevant details independently. Most monitoring aggregation services depend on agents pushing metrics to them. Prometheus is an exception. It pulls metrics out of the installed agents.

What to monitor¶

Ideal Services Workflow depends on this factors: - collection of relevant metrics - perception of continuous changes inside the k8s cluster.

A good pipeline should focus on collecting relevant metrics. There are plenty of agents that can collect OS and process-level metrics. But you will find very few out there that can collect details about containers running at a given node, such as the number of running containers, container state, docker engine metrics, etc. cAdvisor is the best agent for this job.

Perception of continuous changes means that the monitoring pipeline is aware of different pods, containers instances and can relate them to their parent entities, i.e. Deployment, Statefulsets, Namespace, etc. It also means that the metrics server is aware of system-wide metrics that should be visible to users, such as the number of pending pods, nodes status, etc.

TL;DR¶

You need to differentiate between core metrics pipeline and the services pipeline.
You should pick the best pipeline that works for your needs.
The community official metrics collector tool is Prometheus.
Use Grafana Dashboards for visualization. But not for alerting.

Monitoring

Services Monitoring Workflow¶

What to monitor¶

TL;DR¶

Comments