Tech Guide

Monitoring Stack: Grafana + Prometheus for Small Teams

12 March 2026 · 9 min read

A monitoring stack is essential for understanding system behavior. This guide covers a minimal, production-ready setup with Prometheus and Grafana.

Architecture Overview

Prometheus: Scrapes metrics from endpoints, stores time-series data
Grafana: Visualizes Prometheus data, manages alerting rules
Node Exporter: Collects host-level metrics (CPU, disk, memory)
Alertmanager: Handles alert routing and deduplication

Prometheus Configuration

Set up a prometheus.yml scrape config:

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
  
  - job_name: 'docker'
    static_configs:
      - targets: ['localhost:9323']
  
  - job_name: 'proxmox'
    static_configs:
      - targets: ['proxmox-host:8006']

Store metrics on a dedicated, high-IOPS volume. Retention policy: 30 days for raw data, 1 year for aggregates.

Grafana Dashboards

Import pre-built dashboards from the community or build custom ones:

System health (CPU, memory, disk usage)
Network throughput and error rates
Application response times
Container and VM resource usage

Configure notification channels:

Slack for critical alerts
Email for weekly summaries
PagerDuty for on-call escalation

Alerting Rules

Define alert thresholds based on your SLA:

groups:
  - name: infrastructure
    rules:
      - alert: HighCPU
        expr: node_cpu_seconds_total > 0.8
        for: 5m
        annotations:
          summary: "High CPU on {{ $labels.instance }}"

Test alerts regularly to ensure they reach the right teams.

Capacity Planning

Review Prometheus storage needs monthly. For a typical small team setup:

20 targets × 1000 metrics each × 30-day retention = ~200 GB storage
CPU: modest (2 cores sufficient)
Memory: 4 GB minimum, 8 GB recommended