Infrastructure Monitoring11 min readMarch 24, 2026

Database Monitoring: Essential Health Checks for MySQL and PostgreSQL

Your application is only as reliable as its database. Learn how to monitor MySQL and PostgreSQL for connection health, query performance, replication lag, and storage capacity before problems become outages.

database monitoringMySQL monitoringPostgreSQL monitoringquery performancedatabase health checks

UptimeMonitorX Team

Published March 24, 2026

Database Monitoring: Essential Health Checks for MySQL and PostgreSQL

Every application outage investigation starts with the same question: "Is the database okay?" Databases are the foundation of almost every web application, and when they struggle, everything built on top of them fails. Slow queries make pages timeout. Connection pool exhaustion causes intermittent errors. Replication lag means your read replicas serve stale data. Disk space running out crashes the entire server with no graceful degradation.

The challenge is that databases fail gradually. They do not go from healthy to dead in an instant. Performance degrades over days or weeks as data grows, queries become less efficient, and connection patterns change. Without proactive monitoring, these gradual degradations suddenly cross a threshold and become a full outage - usually at peak traffic when you can least afford it.

Connection Health Monitoring

The most fundamental database check is whether your application can actually connect to the database and execute a query. This sounds obvious, but many monitoring setups only check if the database server is reachable via TCP, which misses authentication failures, max connection limits, and application-level issues.

Implement an active connection check that opens a connection, executes a lightweight query (like SELECT 1 for MySQL or SELECT 1 for PostgreSQL), and verifies the result. This confirms that the database server is running, accepting connections, authenticating correctly, and processing queries. Time this entire operation - if a simple query takes more than 100 milliseconds, something is wrong even if the check technically passes.

Monitor connection pool utilization. Your application likely uses a connection pool (e.g., PgBouncer for PostgreSQL or built-in pooling in your ORM). Track the ratio of active connections to maximum pool size. Alert when utilization exceeds 80% - this gives you time to investigate before the pool is exhausted and new requests start failing.

Track the total number of active connections on the database server itself. MySQL defaults to a max of 151 connections (max_connections). PostgreSQL defaults to 100. If you have multiple application servers, microservices, and background workers all connecting to the same database, you can hit these limits faster than expected. Set up alerts at 70% of your configured maximum.

Query Performance Monitoring

Slow queries are the most common cause of database-related application issues. A single poorly optimized query can consume enough resources to slow down every other query running on the server.

Enable slow query logging. In MySQL, set slow_query_log = 1 and long_query_time = 1 (seconds) to log queries taking more than 1 second. In PostgreSQL, set log_min_duration_statement = 1000 (milliseconds). These logs are your primary tool for identifying problematic queries.

Monitor average query response time for your most critical queries. Identify the 10-20 queries that your application executes most frequently or that support your most important user flows (login, checkout, dashboard loading). Track their P50, P95, and P99 response times. A P95 that gradually creeps up from 50ms to 200ms over a month indicates growing data volumes or missing index optimization.

Watch for lock contention. Both MySQL and PostgreSQL use locking mechanisms to ensure data consistency. When multiple queries compete for locks on the same rows or tables, they queue up and wait. Monitor the number of queries in a waiting state. In PostgreSQL, query pg_stat_activity for rows where wait_event_type = 'Lock'. In MySQL, check SHOW ENGINE INNODB STATUS for lock wait information.

Start Monitoring Your Uptime Today

Monitor websites, servers, APIs, and SSL certificates 24/7. Get instant alerts and detailed reports. Free to start - no credit card required.

Get Started Free

Replication Monitoring

If you use read replicas (and most production databases do), replication health is critical to monitor:

Replication lag measures how far behind a replica is from the primary server. In MySQL, check Seconds_Behind_Master from SHOW SLAVE STATUS. In PostgreSQL, compare the write-ahead log (WAL) position between primary and replica. Acceptable lag depends on your application - for a status dashboard, 30 seconds might be fine. For a banking application, any lag is concerning.

Replication status - is replication actually running? A replica can stop replicating due to network issues, disk space problems, or conflicting transactions. Monitor the replication thread status in MySQL (Slave_IO_Running and Slave_SQL_Running should both be "Yes") and the streaming replication state in PostgreSQL.

Alert immediately when replication breaks. A broken replica that is not detected can silently serve increasingly stale data for hours or days. When a failover is needed, you discover the replica is days behind and cannot serve as a replacement for the primary.

Storage and Capacity Monitoring

Running out of disk space is one of the most preventable causes of database outages:

Monitor disk space utilization on your database server volumes. Set warning alerts at 70% and critical alerts at 85%. Database disk usage grows predictably - track the growth rate and project when you will need to expand storage. A database that grows by 2 GB per week and has 20 GB free gives you roughly 10 weeks to plan a storage expansion.

Monitor table size growth for your largest tables. In PostgreSQL, query pg_total_relation_size() for your top tables. In MySQL, check information_schema.TABLES for DATA_LENGTH + INDEX_LENGTH. Unexpectedly rapid growth in a specific table often indicates a bug (runaway logging, failed cleanup jobs, or data import loops).

Track WAL/binlog disk usage separately. PostgreSQL write-ahead logs and MySQL binary logs can consume significant disk space, especially during high-write periods. If WAL archiving falls behind or binlog purging is not configured, these files can fill up the disk even when your actual data size is stable.

Monitor table bloat in PostgreSQL. PostgreSQL's MVCC architecture creates dead tuples when rows are updated or deleted. The autovacuum process cleans these up, but if it falls behind, tables bloat with dead data. Monitor the ratio of dead tuples to live tuples in pg_stat_user_tables. A table where dead tuples exceed 20% of live tuples needs attention.

Memory and Buffer Monitoring

Database performance depends heavily on how effectively it uses memory:

Buffer cache hit ratio measures how often requested data is found in memory versus being read from disk. In PostgreSQL, calculate this from pg_stat_database: blks_hit / (blks_hit + blks_read). In MySQL, check the InnoDB buffer pool hit rate from SHOW ENGINE INNODB STATUS. A hit ratio below 95% indicates your buffer pool is too small for your working dataset.

Monitor sort and temporary table usage. Queries that cannot complete their sorting or grouping operations in memory spill to disk, which is dramatically slower. In MySQL, compare Sort_merge_passes over time. In PostgreSQL, track temp_files and temp_bytes in pg_stat_database. A sudden increase indicates new queries that need optimization or additional work_mem allocation.

Backup Verification Monitoring

A backup that has never been tested is not a backup - it is a hope. Include backup verification in your monitoring:

Monitor backup completion. Verify that your automated backups (pg_dump, mysqldump, or continuous archiving) complete successfully and on schedule. Alert if a backup job has not completed within its expected window.

Monitor backup size trends. A backup that is significantly smaller than the previous one might indicate a truncated or corrupted backup. A backup that is dramatically larger might indicate unexpected data growth.

Automate restore testing. At minimum, run a monthly automated restore of your latest backup to a test database and verify that a sample query returns expected results. If your backup monitoring only confirms that a file was created, you could discover during a real disaster that the backup is corrupted or incomplete.

Start Monitoring Your Uptime Today

Monitor websites, servers, APIs, and SSL certificates 24/7. Get instant alerts and detailed reports. Free to start - no credit card required.

Get Started Free

Setting Up Effective Database Alerts

Database monitoring generates a lot of data. The key is configuring alerts that surface actionable information without creating noise:

Set warning thresholds for gradual degradation: connection utilization above 70%, disk usage above 70%, replication lag above 10 seconds, buffer cache hit ratio below 97%. These give you time to investigate proactively.

Set critical thresholds for imminent problems: connection utilization above 90%, disk usage above 85%, replication broken, primary server unreachable. These require immediate response.

Avoid alerting on individual slow queries. Instead, alert when the P95 query response time for a critical query exceeds your threshold for 5 consecutive minutes. This filters out one-off slow queries caused by temporary load spikes and surfaces genuine performance degradation.

Conclusion

Database monitoring is not optional for any production application. Connection health, query performance, replication status, storage capacity, and memory utilization are the five pillars of database reliability. Start with basic connection checks and disk space alerts, then progressively add query performance tracking and replication monitoring. The databases that crash are almost always the ones that were showing warning signs for weeks before anyone noticed.

Share this article

Twitter / X LinkedIn Email

Monitor your website uptime

Start monitoring in 30 seconds. Get instant alerts when your website goes down. No credit card required.

Try Free

PreviousUptime Monitoring for Healthcare: Ensuring HIPAA-Compliant Availability NextMonitoring Microservices: Strategies for Distributed System Observability