Beavoyce (Be-a-voyce)
Everyday Australian's Political Commentary
Tech Guide

Monitoring Stack: Grafana + Prometheus for Small Teams

12 March 2026 · 9 min read

MonitoringGrafanaPrometheus

A monitoring stack is essential for understanding system behavior. This guide covers a minimal, production-ready setup with Prometheus and Grafana.

Architecture Overview

  • Prometheus: Scrapes metrics from endpoints, stores time-series data
  • Grafana: Visualizes Prometheus data, manages alerting rules
  • Node Exporter: Collects host-level metrics (CPU, disk, memory)
  • Alertmanager: Handles alert routing and deduplication

Prometheus Configuration

Set up a prometheus.yml scrape config:

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
  
  - job_name: 'docker'
    static_configs:
      - targets: ['localhost:9323']
  
  - job_name: 'proxmox'
    static_configs:
      - targets: ['proxmox-host:8006']

Store metrics on a dedicated, high-IOPS volume. Retention policy: 30 days for raw data, 1 year for aggregates.

Grafana Dashboards

Import pre-built dashboards from the community or build custom ones:

  • System health (CPU, memory, disk usage)
  • Network throughput and error rates
  • Application response times
  • Container and VM resource usage

Configure notification channels:

  • Slack for critical alerts
  • Email for weekly summaries
  • PagerDuty for on-call escalation

Alerting Rules

Define alert thresholds based on your SLA:

groups:
  - name: infrastructure
    rules:
      - alert: HighCPU
        expr: node_cpu_seconds_total > 0.8
        for: 5m
        annotations:
          summary: "High CPU on {{ $labels.instance }}"

Test alerts regularly to ensure they reach the right teams.

Capacity Planning

Review Prometheus storage needs monthly. For a typical small team setup:

  • 20 targets × 1000 metrics each × 30-day retention = ~200 GB storage
  • CPU: modest (2 cores sufficient)
  • Memory: 4 GB minimum, 8 GB recommended