Files
Jan Novak dda6a9d032 vms: add monitoring stack and node-exporter for docker host
utility-101-shadow:
- Add full monitoring stack (Prometheus + Blackbox Exporter + Alertmanager)
  with Docker Compose and a systemd unit (monitoring.service)
- Prometheus scrapes: itself, blackbox-exporter, and node-exporter on
  the docker host (docker:9100); blackbox probes cover HTTPS endpoints
  with TLS cert monitoring
- Alertmanager routes warnings to Slack/Discord, critical alerts also
  to email (Gmail SMTP); inhibit rule suppresses SSLCertExpiringSoon
  when SSLCertExpired already fires
- Alert rules: 11 node-exporter alerts (host down, CPU, memory, disk
  fill/prediction, iowait, OOM kill, systemd failed units) + 3 blackbox
  alerts (probe failed, SSL expiring, SSL expired)
- readme: add services list and Docker Engine installation steps

docker host:
- Add node-exporter container running with host pid/network and
  read-only mounts of /proc, /sys, / for full host metrics visibility
- Enable --collector.systemd for systemd unit state metrics
- Add systemd unit (node-exporter.service) to manage the container

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 23:07:44 +01:00

58 lines
1.5 KiB
YAML

version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./alerts.yml:/etc/prometheus/alerts.yml:ro
- ./data:/prometheus
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=60d'
- '--web.enable-lifecycle'
networks:
- monitoring-network
blackbox-exporter:
image: prom/blackbox-exporter:latest
container_name: blackbox-exporter
restart: unless-stopped
ports:
- "9115:9115"
volumes:
- ./blackbox.yml:/etc/blackbox_exporter/config.yml:ro
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
networks:
- monitoring-network
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
restart: unless-stopped
ports:
- "9093:9093"
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
- ./alertmanager-data:/alertmanager
- ./smtp_password:/run/secrets/smtp_password:ro
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
networks:
- monitoring-network
networks:
monitoring-network:
driver: bridge