Database Performance Monitoring Setup (pg_stat_statements/slow query log)
Database monitoring is not a dashboard for its own sake. It's a system that in 30 seconds answers: "What exactly slowed down and why". Without properly configured monitoring, degradation leads to guessing — looking in the wrong places, losing hours.
PostgreSQL: pg_stat_statements
Extension pg_stat_statements accumulates statistics for each unique query:
# postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000 # how many unique queries to keep
pg_stat_statements.track = all # top — only top-level, all — nested too
pg_stat_statements.track_utility = off # don't track COPY, VACUUM etc.
After PostgreSQL restart:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
Useful queries for analysis:
-- Queries consuming most total time
SELECT
left(query, 120) AS query,
calls,
round(total_exec_time::numeric / 1000, 1) AS total_sec,
round(mean_exec_time::numeric, 1) AS avg_ms,
round(stddev_exec_time::numeric, 1) AS stddev_ms,
round(rows::numeric / nullif(calls, 0), 0) AS rows_per_call
FROM pg_stat_statements
WHERE dbid = (SELECT oid FROM pg_database WHERE datname = current_database())
AND calls > 10
ORDER BY total_exec_time DESC
LIMIT 20;
-- Queries with high variance — unstable (sometimes fast, sometimes slow)
SELECT left(query, 120) AS query, calls,
round(mean_exec_time::numeric, 1) AS avg_ms,
round(stddev_exec_time::numeric, 1) AS stddev_ms,
round(stddev_exec_time / nullif(mean_exec_time, 0) * 100, 1) AS cv_pct
FROM pg_stat_statements
WHERE calls > 100
ORDER BY cv_pct DESC
LIMIT 10;
PostgreSQL: auto_explain
pg_stat_statements shows what is slow, auto_explain — why: automatically logs execution plan for slow queries.
# postgresql.conf
shared_preload_libraries = 'pg_stat_statements,auto_explain'
auto_explain.log_min_duration = 500 # ms: log queries > 500ms
auto_explain.log_analyze = true # real execution time, not just estimate
auto_explain.log_buffers = true # how many pages hit/read
auto_explain.log_format = json # for structured parsing
auto_explain.log_nested_statements = true
Output goes to PostgreSQL main log (/var/log/postgresql/). Convenient to analyze via pgBadger.
MySQL/MariaDB: slow query log
# /etc/mysql/conf.d/slow-log.cnf
[mysqld]
slow_query_log = ON
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 1 # seconds
log_queries_not_using_indexes = ON
min_examined_row_limit = 1000 # ignore queries examining < 1000 rows
log_slow_rate_limit = 100 # MariaDB: log every 100th such query
Enable without restart:
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
SET GLOBAL log_queries_not_using_indexes = 'ON';
Analysis via Percona pt-query-digest:
# Top queries from last hour slow.log
pt-query-digest \
--since="1h ago" \
--limit 20 \
--output report \
/var/log/mysql/slow.log
# Output: for each unique query — count, avg/max time, rows examined
Prometheus + Grafana: Metric Monitoring
For PostgreSQL — postgres_exporter:
# docker-compose.yml
services:
postgres_exporter:
image: quay.io/prometheuscommunity/postgres-exporter:latest
environment:
DATA_SOURCE_NAME: "postgresql://monitoring:password@postgres:5432/mydb?sslmode=disable"
ports:
- "9187:9187"
# prometheus.yml — scrape config
- job_name: postgresql
static_configs:
- targets: ['postgres_exporter:9187']
scrape_interval: 15s
Key metrics and alerts:
# alerting rules
groups:
- name: postgresql
rules:
- alert: PostgreSQLSlowQueries
expr: rate(pg_stat_statements_total_exec_time_seconds_total[5m]) > 10
for: 2m
annotations:
summary: "High total query execution time"
- alert: PostgreSQLHighConnections
expr: pg_stat_activity_count > pg_settings_max_connections * 0.8
for: 1m
annotations:
summary: "Using > 80% of connections"
- alert: PostgreSQLReplicationLag
expr: pg_replication_lag > 30
for: 1m
annotations:
summary: "Replica is {{ $value }} seconds behind"
For MySQL — mysqld_exporter:
mysqld_exporter \
--config.my-cnf=/etc/mysql/monitoring.cnf \
--collect.global_status \
--collect.info_schema.innodb_metrics \
--collect.perf_schema.eventsstatements \
--collect.perf_schema.eventsstatementssum
pgBadger: PostgreSQL Log Analysis
# Installation
apt-get install pgbadger
# Analyze for a day
pgbadger \
--format=stderr \
--outfile=/var/www/reports/pgbadger_$(date +%Y%m%d).html \
/var/log/postgresql/postgresql-2025-01-15_*.log
# Generate daily via cron
0 6 * * * /usr/bin/pgbadger --incremental --outdir /var/www/reports/pgbadger /var/log/postgresql/postgresql.log
pgBadger builds HTML report: top slow queries, queries by time of day, lock events, checkpoint activity.
Grafana Dashboards
Ready dashboards from Grafana Dashboard Repository:
- PostgreSQL: ID 9628 (postgres_exporter dashboard) — connections, transactions/sec, cache hit rate, replication lag
- MySQL: ID 7362 (MySQL Overview) — InnoDB buffer pool, queries/sec, slow queries
- pgBouncer: ID 13474
Import via Grafana UI: Dashboards → Import → enter ID.
What to Monitor Mandatory
| Metric | Alert Threshold |
|---|---|
| Cache hit rate (PG) | < 99% |
| Active connections | > 80% of max_connections |
| Replication lag | > 30 seconds |
| Table size + bloat | > 30% bloat |
| Checkpoint duration | > 30 seconds |
| Dead tuples | > 10% of live |
| Slow queries count/min | increasing trend |
| Disk usage | > 80% |
Setting up complete stack (postgres_exporter + Prometheus + Grafana + Telegram alerts) takes 1–2 days.







