Last year I got called in to debug a SaaS app where the monthly AWS bill had crept from $8K to $47K over six months. Nobody could figure out where the money was going. The answer was a single PostgreSQL query — a dashboard endpoint that ran 4,000 times per hour, doing a sequential scan on a 38-million-row table. The fix was one index. Took four minutes to write, twelve minutes to build. The AWS bill dropped to $11K the next month.
That query was costing them roughly $40,000 per month. And nobody knew it existed.
This happens everywhere. Missing indexes are the most common cause of slow queries in PostgreSQL. But what most people don't realize is that wrong indexes — unused, bloated, or poorly structured — are almost as expensive. They slow down writes, waste storage, and consume cache that should be serving your actual workload.
How a Missing Index Costs $40K/Month
Let me walk through the math, because the numbers matter.
The query in question was simple enough:
SELECT user_id, plan_type, status, last_active_at
FROM subscriptions
WHERE status = 'active'
AND plan_type = 'enterprise'
ORDER BY last_active_at DESC
LIMIT 50;
Without an index, PostgreSQL performs a sequential scan — reading every single row in the table to find the matching ones. On a 38-million-row table, that means reading roughly 30 GB of data. Every. Single. Time.
On Aurora Standard, I/O operations cost $0.20 per million requests. Each page read counts as one I/O operation. With 8 KB pages, 30 GB equals about 3.9 million page reads per execution. At 4,000 executions per hour, that's 15.6 billion page reads per hour, or roughly 374 billion per month.
At $0.20 per million: $74,800/month in I/O costs alone. The actual bill was lower because of caching and buffer pool hits, but the compute cost of processing all those rows on an oversized instance was the other half.
With the right index, the same query reads about 50 rows from the index plus 50 heap fetches. Maybe 200 I/O operations total. The monthly I/O cost for this query dropped to essentially zero.
Here's the before and after:
| Metric | Without Index | With Index |
|---|
| Rows scanned per query | 38,000,000 | ~50 |
| Pages read per query | ~3,900,000 | ~200 |
| Query time | ~2,400ms | ~3ms |
| Monthly I/O operations | ~374 billion | ~576 million |
| Monthly I/O cost | ~$74,800 | ~$115 |
The index that fixed it:
CREATE INDEX CONCURRENTLY idx_subscriptions_status_plan_active
ON subscriptions (status, plan_type, last_active_at DESC)
WHERE status = 'active';
A partial composite index with the sort column included. Three columns, one WHERE clause, four minutes of work.
The Three Index Mistakes I See Everywhere
After years of debugging PostgreSQL performance issues, the same three mistakes show up in maybe 80% of cases.
Mistake 1: No Indexes Where You Need Them
This is the obvious one, but it's surprisingly common in production. Your dev database has 500 rows. A sequential scan on 500 rows takes microseconds. Nobody notices. Then your production database hits 10 million rows and everything falls apart.
Here's how to find the tables that are bleeding:
-- Tables with high sequential scan ratios (potential missing indexes)
SELECT
schemaname,
relname AS table_name,
seq_scan,
seq_tup_read,
idx_scan,
CASE WHEN (seq_scan + idx_scan) > 0
THEN round(100.0 * seq_scan / (seq_scan + idx_scan), 1)
ELSE 0
END AS seq_scan_pct,
pg_size_pretty(pg_relation_size(schemaname || '.' || relname)) AS table_size
FROM pg_stat_user_tables
WHERE seq_scan > 100
AND pg_relation_size(schemaname || '.' || relname) > 10485760 -- > 10MB
ORDER BY seq_tup_read DESC
LIMIT 20;
If you see a table with a seq_scan_pct above 90% and a table_size in the gigabytes, that's your smoking gun. Tables with high sequential scan counts and large seq_tup_read values are almost always missing an index.
Mistake 2: Too Many Indexes (Yes, This Is Also Bad)
Here's the counterintuitive part: unused indexes on actively written tables still consume cache because every INSERT and non-HOT UPDATE must modify all indexes, forcing index pages into memory.
More indexes means:
- Slower writes (every INSERT/UPDATE touches every index)
- More WAL generation (each index change produces WAL records)
- Slower VACUUM (it must process all indexes, even unused ones)
- Wasted shared_buffers (indexes compete for cache with your actual data)
- More index bloat from dead tuples that accumulate over time
Best practice is to stick to 5-10 well-chosen indexes per table. Here's how to find the ones doing nothing:
-- Unused indexes wasting resources
SELECT
schemaname || '.' || relname AS table_name,
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE 'pg_toast%'
AND pg_relation_size(indexrelid) > 1048576 -- > 1MB
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 15;
If an index shows idx_scan = 0 after a month of usage data, it's a candidate for removal. Check the stats_reset timestamp in pg_stat_database to make sure you're looking at enough history — you want at least a full business cycle (typically a month) before dropping anything.
Mistake 3: Wrong Index Type for the Job
PostgreSQL has seven index types. Most developers only use one: B-tree. That's like owning a Swiss Army knife and only using the bottle opener.
| Index Type | Best For | Don't Use When |
|---|
| B-tree (default) | Equality, range queries, sorting | Full-text search, array/JSONB |
| GIN | Arrays, JSONB, full-text search | Simple equality lookups on scalars |
| GiST | Geometric data, range types, full-text | Standard equality/range on integers |
| BRIN | Huge tables with naturally ordered data (timestamps, IDs) | Randomly ordered data |
| Hash | Pure equality lookups (no range) | Anything with ORDER BY or range |
| SP-GiST | IP addresses, phone numbers, quad-trees | General-purpose queries |
The most underused type? BRIN. If you have a time-series table with hundreds of millions of rows ordered by timestamp, a BRIN index is vastly smaller and cheaper to maintain than a B-tree. It stores block-range summaries instead of individual row pointers. On a 100-million-row events table, a B-tree on created_at might be 2 GB. The equivalent BRIN index? Maybe 500 KB.
The most misused type? B-tree on JSONB columns. If you're querying data->>'email', a B-tree on a generated column works. But if you're doing data @> '{"tags": ["premium"]}', you need a GIN index. The B-tree simply won't be used.
The Composite Index Trap
Composite indexes (multi-column indexes) are the most powerful and most misunderstood tool in the PostgreSQL indexing toolkit.
The cardinal rule: column order matters. PostgreSQL can only use a composite index efficiently if the query starts with the first column or matches the first few columns in order. It's lexicographic ordering — rows are sorted by the first column, then by the second within each first-column group, and so on.
This means:
-- Index: (status, plan_type, created_at)
-- ✅ Uses the index efficiently
SELECT * FROM subscriptions WHERE status = 'active';
SELECT * FROM subscriptions WHERE status = 'active' AND plan_type = 'pro';
SELECT * FROM subscriptions WHERE status = 'active' AND plan_type = 'pro'
AND created_at > '2026-01-01';
-- ❌ Cannot use this index (skips first column)
SELECT * FROM subscriptions WHERE plan_type = 'pro';
SELECT * FROM subscriptions WHERE created_at > '2026-01-01';
The best practice for ordering columns in a composite index: put the most selective (discriminative) column first. If status has 3 distinct values and plan_type has 15, you might want plan_type first — unless the queries always filter on status first, in which case the query pattern takes priority.
Here's my decision process:
- Look at the actual queries (not what you think the queries are)
- Columns used in WHERE equality conditions go first
- Columns used in range conditions (BETWEEN, >, etc.) go next
- Columns used in ORDER BY go last
- If in doubt, check with EXPLAIN ANALYZE
Covering Indexes: The 40x Speed Trick
This is the single most impactful optimization that most developers don't know about.
A covering index includes all the columns a query needs, so PostgreSQL can answer the query entirely from the index without touching the table at all. This is called an index-only scan, and it can be up to 40x faster than a regular index scan.
PostgreSQL supports this with the INCLUDE clause:
-- Regular index: still needs to visit the table for name and email
CREATE INDEX idx_users_status ON users (status);
-- Covering index: can answer the query entirely from the index
CREATE INDEX idx_users_status_covering ON users (status)
INCLUDE (name, email, last_login);
Now a query like:
SELECT name, email, last_login FROM users WHERE status = 'active';
...performs an index-only scan. No table access at all. In high-concurrency environments, this can reduce I/O overhead by 90%.
The trade-off: covering indexes are larger than regular indexes because they store the extra column data. And they work best on tables where data is mostly static or append-only, because frequently updated rows invalidate the Visibility Map and force heap fetches anyway.
My rule of thumb: if a query runs thousands of times per hour and selects the same handful of columns, a covering index is almost always worth it.
One important gotcha: index-only scans only work when the Visibility Map shows that the heap pages are "all-visible." If your table gets frequent updates, PostgreSQL can't trust the index alone and has to check the heap anyway — which defeats the purpose. The fix is simple: make sure VACUUM runs frequently enough to keep the Visibility Map up to date. On heavily updated tables, consider tuning autovacuum_vacuum_scale_factor down from the default 0.2 to something like 0.05 or even 0.01.
You can check how well index-only scans are working by looking at the Heap Fetches number in EXPLAIN ANALYZE output. If heap fetches are close to the total rows returned, your Visibility Map is stale and you need more aggressive vacuuming.
Partial Indexes: The Index Nobody Uses (But Should)
A partial index only covers rows matching a WHERE clause. It's smaller, faster to maintain, and faster to scan than a full index.
Real example: you have an orders table with 50 million rows. Only 2% of orders have status = 'pending'. A full B-tree on status indexes all 50 million rows, including the 49 million you'll never query. A partial index:
-- Full index: indexes all 50M rows (wasteful)
CREATE INDEX idx_orders_status ON orders (status);
-- Partial index: indexes only the 1M pending rows
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';
The partial index is roughly 50x smaller. It's faster to maintain during writes because it only updates when a row enters or leaves the pending state. And it scans faster because there's less data to read.
I worked on a job queue table once — 200 million rows, 99.8% of them completed. The team had a B-tree on (status, created_at) that was 14 GB. We replaced it with a partial index WHERE status IN ('pending', 'running') that was 8 MB. Eight megabytes. The write throughput for new jobs increased by 35% because the index maintenance dropped to almost nothing.
Partial indexes are perfect for:
- Status fields (
WHERE status = 'active')
- Soft deletes (
WHERE deleted_at IS NULL)
- Feature flags (
WHERE is_premium = true)
- Queue tables (
WHERE processed = false)
The PostgreSQL docs explicitly recommend partial indexes for these patterns, yet I rarely see them in production. Everyone defaults to full B-tree indexes on every column. It's a massive waste.
The Index Health Check: A Step-by-Step Playbook
Here's the exact process I use when I audit a PostgreSQL database. You can do this in under an hour.
Step 1: Enable pg_stat_statements (If You Haven't)
-- Add to postgresql.conf or ALTER SYSTEM
ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';
-- Restart required, then:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
This gives you the execution statistics for every SQL statement hitting your database. The overhead is roughly 1% CPU — negligible.
Step 2: Find Your Worst Queries
-- Top 10 queries by total execution time
SELECT
round(total_exec_time::numeric, 2) AS total_time_ms,
calls,
round(mean_exec_time::numeric, 2) AS avg_time_ms,
round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct_of_total,
left(query, 100) AS query_preview
FROM pg_stat_statements
WHERE calls > 100
ORDER BY total_exec_time DESC
LIMIT 10;
The top 3-5 queries usually account for 60-80% of total database time. Focus there first.
Step 3: Run EXPLAIN ANALYZE on the Top Offenders
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT user_id, plan_type, status
FROM subscriptions
WHERE status = 'active' AND plan_type = 'enterprise'
ORDER BY last_active_at DESC LIMIT 50;
Look for:
Seq Scan on large tables (missing index)
Sort with Sort Method: external merge (needs ORDER BY in index)
Heap Fetches in index-only scans (high = stale visibility map = needs VACUUM)
- Row estimate mismatches (actual rows far from planned rows = stale statistics, run ANALYZE)
PostgreSQL's default random_page_cost is 4.0, meaning it assumes random reads are 4x slower than sequential reads. If you're on SSDs, change this to 1.0-1.5. This alone can change the planner from choosing sequential scans to index scans.
Step 4: Find Missing Indexes
Run the sequential scan detection query from earlier. Any table with over 90% sequential scans and significant size needs investigation.
Step 5: Find Unused Indexes
Run the unused index detection query. Drop anything with idx_scan = 0 after verifying you have at least a month of statistics.
-- Always use CONCURRENTLY to avoid locking
DROP INDEX CONCURRENTLY idx_that_nobody_uses;
Step 6: Check for Index Bloat
-- Estimate index bloat
SELECT
schemaname || '.' || relname AS table_name,
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan,
idx_tup_read
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 104857600 -- > 100MB
ORDER BY pg_relation_size(indexrelid) DESC;
If an index is much larger than you'd expect for the data it covers, it's likely bloated from dead tuples. Fix with:
-- Rebuild the index without locking the table
REINDEX INDEX CONCURRENTLY idx_bloated_index;
Step 7: Set Up Ongoing Monitoring
Don't let this be a one-time exercise. Use pganalyze (paid, excellent), pg_stat_monitor (free), or at minimum, set up auto_explain to log slow query plans:
-- Log explain plans for queries taking > 1 second
ALTER SYSTEM SET auto_explain.log_min_duration = '1000ms';
ALTER SYSTEM SET auto_explain.log_analyze = true;
ALTER SYSTEM SET auto_explain.log_buffers = true;
SELECT pg_reload_conf();
The Cheat Sheet
Here's my quick-reference for index decisions:
| Query Pattern | Index Strategy |
|---|
WHERE col = value | B-tree on col |
WHERE col1 = x AND col2 = y | Composite B-tree (col1, col2) — most selective first |
WHERE status = 'active' (status has few values) | Partial index WHERE status = 'active' |
WHERE col BETWEEN x AND y | B-tree on col |
WHERE jsonb_col @> '{"key": "val"}' | GIN on jsonb_col |
WHERE col LIKE 'prefix%' | B-tree with text_pattern_ops |
WHERE col LIKE '%search%' | GIN with pg_trgm extension |
WHERE tsv @@ to_tsquery('search') | GIN on tsvector column |
| Time-series data, 100M+ rows | BRIN on timestamp column |
SELECT a, b FROM t WHERE a = x | Covering index (a) INCLUDE (b) |
| Soft-deleted records | Partial index WHERE deleted_at IS NULL |
What I Actually Think
I've worked with enough production PostgreSQL databases to have a strong opinion here: most performance problems aren't hardware problems or schema problems. They're indexing problems. And they're indexing problems because nobody audits indexes after the initial schema migration.
Teams spend weeks choosing between Aurora and Cloud SQL, debating instance types, configuring connection pools — and then never look at pg_stat_user_indexes once in two years. They throw money at bigger instances when the real fix is a three-line CREATE INDEX statement.
Here's what bothers me most: the knowledge gap isn't about intelligence. It's about awareness. Smart, experienced developers write queries that run 1,000x slower than they should, not because they can't write a good index, but because they never learned to check if one was needed. They test against dev databases with 500 rows, deploy to production with 50 million rows, and wonder why the app is slow.
Three things would fix this for most teams:
-
Make pg_stat_statements mandatory. If you're running PostgreSQL in production without it, you're flying blind. The 1% CPU overhead is nothing compared to the cost of undetected slow queries.
-
Run an index audit quarterly. It takes an hour. Find the missing indexes, drop the unused ones, rebuild the bloated ones. This single practice would prevent most of the performance crises I've been called to fix.
-
Learn partial and covering indexes. These two features alone can reduce query times by 90%+ in common patterns. They're built into PostgreSQL, free, and dramatically underused.
The database is almost never the bottleneck. The indexes — or lack of them — are. Fix your indexes before you upgrade your instance. I promise you'll be surprised how much money you save.
Sources
- DEV Community — How to Find and Fix Missing Indexes in PostgreSQL
- PostgreSQL Documentation — Index Types
- PostgreSQL Documentation — Partial Indexes
- PostgreSQL Documentation — Index-Only Scans and Covering Indexes
- PostgreSQL Documentation — auto_explain
- PostgreSQL Documentation — Using EXPLAIN
- Neon — PostgreSQL Index Types
- Percona — PostgreSQL Indexes Can Hurt You
- Stormatics — Optimizing PostgreSQL with Composite and Partial Indexes
- Medium/ThreadSafe — Unleashing the Power of Composite Indexes
- Kendra Little — Index Bloat in Postgres
- PostgresAI — Why Keep Your Index Set Lean
- Mydbops — PostgreSQL Index Best Practices
- MyTechMantra — Mastering PostgreSQL Index-Only Scans: 40x Speed
- Crunchy Data — Why Covering Indexes Are Incredibly Helpful
- pgDash — Finding Unused Indexes in PostgreSQL
- Medium — Detecting Missing Indexes with pg_stat_user_tables
- Medium — Detecting and Removing Unused Indexes
- InterDB — Cost Estimation in Single-Table Query
- pganalyze — Query Performance
- pganalyze — FAQ
- Haider ZDBRE — PostgreSQL Index Health Check
- AWS — Amazon Aurora Pricing
- Cybertec — Index Scan vs. Bitmap Scan vs. Sequential Scan
- Medium — PostgreSQL Performance Tuning with pg_stat_statements