You added an index. Your query is still slow. Now what?
This is the most common database performance frustration in backend engineering. A query takes 3 seconds. You look at the WHERE clause, create an index on the relevant column, deploy, and check again. Still 3 seconds. Or worse - the query got slower. The EXPLAIN output looks like hieroglyphics. The query planner is ignoring your index entirely and doing a full table scan on 50 million rows.
The problem isn't indexes. The problem is the assumption that index = faster queries. That's only true when the index matches your query pattern, the query planner decides to use it, the data distribution makes it worthwhile, and the index fits in memory. Miss any one of those conditions, and your index is dead weight - consuming disk space, slowing down writes, and doing absolutely nothing for reads.
This article is about the debugging mindset. Why indexes fail in real systems, how to diagnose the problem, and how to fix it. Not "what is a B-tree." You can Google that.
The Mental Model: What an Index Actually Does
A 30-second refresher, because everything that follows depends on getting this right.
An index is a separate data structure that maps column values to row locations. When the database needs to find rows matching a condition, it can either scan the entire table (sequential scan) or look up matching values in the index and jump directly to those rows (index scan).
Without index (sequential scan):
Query: WHERE user_id = 12345
Table: Row 1 → check → no
Row 2 → check → no
Row 3 → check → no
... (50 million rows later)
Row 38,291,003 → check → YES
... (continues scanning remaining rows)
Time: O(n) - checks every single row
With index (index scan):
Query: WHERE user_id = 12345
Index (B-tree):
[1-25000] → [12000-12500] → [12345] → Row pointer
Table: Jump directly to Row 38,291,003
Time: O(log n) - a few tree traversalsThe critical point most developers miss: The index exists as a separate structure. It has its own storage footprint, its own memory requirements, and it must be updated on every write to the indexed columns. Creating an index isn't free - it's a trade-off between read speed and write overhead.
An index that the query planner doesn't use is just wasted space and write penalty. This happens far more often than most developers realize.
Why Your Index Is Not Being Used
These are the actual reasons indexes get ignored in production. Not theoretical edge cases - these show up in real code reviews and production debugging sessions constantly.
1. Functions or Expressions on Indexed Columns
This is the most common reason indexes are silently ignored. If you apply a function to an indexed column in your WHERE clause, the database can't use the index on that column.
-- Index on 'email' column
CREATE INDEX idx_users_email ON users(email);
-- ❌ THIS WILL NOT USE THE INDEX
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- ❌ THIS WILL NOT USE THE INDEX EITHER
SELECT * FROM users WHERE UPPER(email) = 'JOHN@EXAMPLE.COM';
-- ❌ ALSO BROKEN - any function wrapping the column
SELECT * FROM users WHERE TRIM(email) = 'john@example.com';
SELECT * FROM users WHERE CONCAT(first_name, ' ', last_name) = 'John Doe';
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
SELECT * FROM orders WHERE DATE(created_at) = '2024-01-15';Why this happens: A B-tree index stores values in sorted order. The index on email contains sorted email strings. When you ask for LOWER(email), the database would need to apply LOWER() to every index entry to check if it matches - which is no better than a full table scan. So the query planner skips the index entirely.
The fix - expression indexes (PostgreSQL) or generated columns (MySQL):
-- PostgreSQL: Create an index on the expression
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Now this query USES the index
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- MySQL: Use a generated column + index
ALTER TABLE users ADD email_lower VARCHAR(255)
GENERATED ALWAYS AS (LOWER(email)) STORED;
CREATE INDEX idx_users_email_lower ON users(email_lower);
-- Now query against the generated column
SELECT * FROM users WHERE email_lower = 'john@example.com';-- Common date function trap and fix
-- ❌ BROKEN: Function on column prevents index usage
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
-- ✅ FIXED: Range query uses index on created_at
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
-- ❌ BROKEN: DATE() function on column
SELECT * FROM orders WHERE DATE(created_at) = '2024-01-15';
-- ✅ FIXED: Range query on the original column
SELECT * FROM orders
WHERE created_at >= '2024-01-15' AND created_at < '2024-01-16';This single fix - rewriting function calls as range queries - resolves more slow query issues than any other optimization. Audit every slow query for functions wrapping indexed columns. It's the first thing to check.
2. Leading Wildcards in LIKE Queries
A B-tree index sorts values left to right. It can quickly find all values starting with a prefix, but it cannot efficiently search for patterns in the middle or end of a string.
-- Index on 'name' column
CREATE INDEX idx_users_name ON users(name);
-- ✅ THIS USES THE INDEX (prefix match)
SELECT * FROM users WHERE name LIKE 'John%';
-- ❌ THIS DOES NOT USE THE INDEX (leading wildcard)
SELECT * FROM users WHERE name LIKE '%John';
-- ❌ THIS DOES NOT USE THE INDEX EITHER (wildcard on both sides)
SELECT * FROM users WHERE name LIKE '%John%';Why: Think of a phone book. You can quickly find everyone whose last name starts with "Sm" - just go to the "Sm" section. But finding everyone whose last name contains "oh" means reading every single entry. That's what a leading wildcard does to an index.
The fix depends on your use case:
For suffix searches, store the reversed value and use prefix matching:
-- Need to find emails by domain? (@gmail.com)
ALTER TABLE users ADD email_reversed VARCHAR(255)
GENERATED ALWAYS AS (REVERSE(email)) STORED;
CREATE INDEX idx_users_email_reversed ON users(email_reversed);
-- Search for @gmail.com → search for reversed prefix
SELECT * FROM users WHERE email_reversed LIKE REVERSE('%.gmail.com');
-- This becomes: WHERE email_reversed LIKE 'moc.liamg.%' → uses indexFor substring searches at scale, use full-text search instead of LIKE:
-- PostgreSQL: Full-text search with GIN index
CREATE INDEX idx_products_search ON products USING GIN(to_tsvector('english', name));
SELECT * FROM products
WHERE to_tsvector('english', name) @@ to_tsquery('english', 'wireless & headphones');
-- Or use trigram indexes for partial matching
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_users_name_trgm ON users USING GIN(name gin_trgm_ops);
-- Now LIKE '%john%' CAN use the trigram index
SELECT * FROM users WHERE name LIKE '%john%';The honest take: If you're doing a lot of substring searches, LIKE '%term%' on a B-tree index is the wrong approach entirely. Move to full-text search (PostgreSQL tsvector, MySQL FULLTEXT, Elasticsearch) or trigram indexes. Don't fight the B-tree - use the right data structure.
3. Low Selectivity - The Index Isn't Worth Using
Selectivity measures how many distinct values a column has relative to the total number of rows. A column with high selectivity (many unique values) benefits from indexing. A column with low selectivity (few unique values) often doesn't.
-- High selectivity - good index candidate
-- email: 10,000,000 unique values out of 10,000,000 rows
-- Selectivity: 1.0 (perfect)
CREATE INDEX idx_users_email ON users(email); -- ✅ Useful
-- Low selectivity - bad index candidate
-- status: 3 unique values ('active', 'inactive', 'pending') out of 10,000,000 rows
-- Selectivity: 0.0000003 (terrible)
CREATE INDEX idx_users_status ON users(status); -- ❌ Usually useless
-- Low selectivity - boolean column
-- is_verified: 2 unique values (true/false) out of 10,000,000 rows
CREATE INDEX idx_users_verified ON users(is_verified); -- ❌ Waste of spaceWhy the database ignores low-selectivity indexes: If 60% of your users have status = 'active', an index scan for WHERE status = 'active' would return 6 million rows. The database has to read the index entries, then jump to each row in the table (random I/O). For 6 million rows, a sequential table scan is actually faster because sequential I/O is dramatically cheaper than random I/O.
The math:
10,000,000 row table
Query: WHERE status = 'active' (matches 6,000,000 rows = 60%)
Sequential scan: Read entire table sequentially
→ ~10,000,000 × sequential read cost = fast (sequential I/O is cheap)
Index scan: Look up 6,000,000 entries in index, then random-read each row
→ 6,000,000 × random read cost = SLOW (random I/O is 10-100x more expensive)
The query planner correctly chooses sequential scan.
Your index is ignored. This is the right behavior.The rule of thumb: An index is typically useful when the query returns less than 10-15% of the total rows. Above that, the query planner often prefers a sequential scan. This threshold varies by database, storage type (SSD vs HDD), and data layout, but 10-15% is a reasonable starting assumption.
The fix - partial indexes (when you need the rare values):
-- Instead of indexing all status values...
CREATE INDEX idx_users_status ON users(status); -- ❌ Mostly useless
-- ...index only the values you actually query for
-- If you only query for 'pending' users (which are rare):
CREATE INDEX idx_users_pending ON users(status) WHERE status = 'pending'; -- ✅
-- PostgreSQL: Only 50,000 rows indexed instead of 10,000,000
-- Much smaller, much faster, actually gets used4. Wrong Column Order in Composite Indexes
This trips up experienced developers too. A composite index on (A, B, C) is not the same as separate indexes on A, B, and C. The column order determines which queries the index can serve.
The leftmost prefix rule: A composite index can be used for queries that filter on the leftmost columns of the index, in order. It's like a phone book sorted by last name, then first name. You can look up by last name alone, or by last name + first name. But you can't look up by first name alone - the book isn't sorted that way.
-- Composite index
CREATE INDEX idx_orders_user_status_date ON orders(user_id, status, created_at);
-- ✅ Uses the index (leftmost prefix: user_id)
SELECT * FROM orders WHERE user_id = 123;
-- ✅ Uses the index (leftmost prefix: user_id + status)
SELECT * FROM orders WHERE user_id = 123 AND status = 'shipped';
-- ✅ Uses the index (all three columns)
SELECT * FROM orders WHERE user_id = 123 AND status = 'shipped' AND created_at > '2024-01-01';
-- ❌ DOES NOT USE THE INDEX (missing user_id - the leftmost column)
SELECT * FROM orders WHERE status = 'shipped';
-- ❌ DOES NOT USE THE INDEX (skips status - breaks the prefix chain)
SELECT * FROM orders WHERE user_id = 123 AND created_at > '2024-01-01';
-- (Actually uses the index partially - only the user_id part)
-- ❌ DOES NOT USE THE INDEX (none of the leftmost columns)
SELECT * FROM orders WHERE created_at > '2024-01-01';The last example is the classic mistake. A developer sees slow queries on created_at, notices the composite index includes created_at, and assumes it should be working. It's not, because created_at is the third column and the query doesn't filter on the first two.
How to design composite indexes correctly:
Put equality conditions first, range conditions last. Columns used with = should come before columns used with >, <, BETWEEN, or ORDER BY.
-- Query pattern:
SELECT * FROM orders
WHERE user_id = ? -- equality
AND status = ? -- equality
AND created_at > ? -- range
ORDER BY created_at DESC;
-- ✅ Best index for this query:
CREATE INDEX idx_orders_optimized ON orders(user_id, status, created_at);
-- Equality columns first (user_id, status), range column last (created_at)
-- ❌ Wrong order:
CREATE INDEX idx_orders_bad ON orders(created_at, user_id, status);
-- Range column first breaks the effectiveness for equality lookupsPut the highest-selectivity column first (when all conditions are equality). This narrows the search most quickly.
-- If user_id has 1,000,000 unique values and status has 3:
CREATE INDEX idx_orders ON orders(user_id, status); -- ✅ Better
CREATE INDEX idx_orders ON orders(status, user_id); -- ❌ Worse
-- user_id narrows to ~10 rows, then status filters further
-- vs. status narrows to ~3,333,333 rows, then user_id filters - more work5. Outdated Table Statistics
The query planner doesn't examine actual data to decide whether to use an index. It uses statistics - precomputed summaries of data distribution, cardinality, and value frequency. If these statistics are wrong, the planner makes wrong decisions.
How statistics go stale:
- Bulk data imports that change the distribution
- Mass updates or deletes
- Tables that grow significantly between statistics refreshes
- Partition additions or changes
-- PostgreSQL: Check when statistics were last updated
SELECT
relname AS table_name,
last_autoanalyze,
last_analyze,
n_live_tup AS live_rows,
n_dead_tup AS dead_rows
FROM pg_stat_user_tables
WHERE relname = 'orders';
-- If last_analyze is days old and you've done major data changes: problem.
-- PostgreSQL: Force statistics update
ANALYZE orders;
-- MySQL: Update statistics
ANALYZE TABLE orders;
-- MySQL: Check cardinality estimates (what the planner sees)
SHOW INDEX FROM orders;Real scenario where this bites you: You have an orders table with 100,000 rows, mostly with status = 'completed'. The planner's statistics reflect this. Then you run a batch import of 5 million new orders, all with status = 'pending'. The planner still thinks pending is rare (based on old stats) and uses an index scan. But now pending is the majority - a sequential scan would be 10x faster.
-- After bulk imports, always update statistics
LOAD DATA INFILE 'orders_import.csv' INTO TABLE orders;
ANALYZE TABLE orders; -- ← Don't forget thisPostgreSQL auto-analyze runs periodically, but its default thresholds might not trigger quickly enough after large data changes. For critical tables, run ANALYZE explicitly after bulk operations.
6. Implicit Type Casting
A subtle one that's easy to miss. If your query compares a column to a value of a different type, the database may cast the column value, which functions the same as wrapping it in a function - index is bypassed.
-- Column: phone_number VARCHAR(20)
-- Index on phone_number exists
-- ❌ BROKEN: Comparing varchar to integer forces casting
SELECT * FROM users WHERE phone_number = 5551234567;
-- Database does: WHERE CAST(phone_number AS INTEGER) = 5551234567
-- Function on column → index ignored
-- ✅ FIXED: Use the correct type
SELECT * FROM users WHERE phone_number = '5551234567';
-- Column: id BIGINT
-- ❌ BROKEN in some databases: Comparing bigint to string
SELECT * FROM users WHERE id = '12345';
-- Some databases cast the column, not the literal
-- ✅ FIXED: Match the type
SELECT * FROM users WHERE id = 12345;This is especially common in application code where parameters from HTTP requests (always strings) are passed directly to queries without type conversion. Your ORM might handle this, or it might not. Check.
7. OR Conditions Prevent Index Usage
The OR operator can prevent the database from using an index, depending on the database and the specific query structure.
-- Index on user_id, separate index on email
CREATE INDEX idx_users_user_id ON users(user_id);
CREATE INDEX idx_users_email ON users(email);
-- ❌ May not use either index effectively
SELECT * FROM users WHERE user_id = 123 OR email = 'john@example.com';
-- Some databases will bitmap-scan both indexes and merge
-- Others will fall back to sequential scan
-- ✅ REWRITE as UNION for guaranteed index usage
SELECT * FROM users WHERE user_id = 123
UNION
SELECT * FROM users WHERE email = 'john@example.com';
-- Each sub-query uses its own index, results are mergedPostgreSQL is generally smart about OR conditions and can use bitmap index scans to combine results from multiple indexes. MySQL is worse at this - it often falls back to a full scan on OR conditions across different columns. Test with EXPLAIN on your specific database.
Real-World Problems: When Indexes Hurt
Indexes aren't free. Every index you add has costs that show up in write performance, storage, and memory pressure. Here's where the pain appears.
Too Many Indexes = Slow Writes
Every INSERT, UPDATE, or DELETE on a table must update all indexes on that table. A table with 10 indexes means every write operation performs 10 additional index modifications.
Real numbers:
Benchmark: INSERT performance on a table with varying index counts
Table: 10 million rows, PostgreSQL 16, NVMe SSD
Indexes: 0 → 45,000 inserts/second
Indexes: 2 → 32,000 inserts/second (-29%)
Indexes: 5 → 18,000 inserts/second (-60%)
Indexes: 10 → 8,500 inserts/second (-81%)
Indexes: 15 → 4,200 inserts/second (-91%)Those numbers are illustrative, but the pattern is real. Each index roughly adds a 10-20% write overhead relative to the base. For write-heavy tables (logging, events, metrics), unnecessary indexes are a direct performance tax.
How to find unused indexes:
-- PostgreSQL: Find indexes that are never scanned
SELECT
schemaname,
relname AS table_name,
indexrelname AS index_name,
idx_scan AS times_index_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey' -- Don't drop primary keys
ORDER BY pg_relation_size(indexrelid) DESC;
-- MySQL: Check index usage statistics (MySQL 8.0+, performance_schema)
SELECT
object_schema,
object_name AS table_name,
index_name,
count_star AS rows_read_via_index
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
AND count_star = 0
AND object_schema = 'your_database'
ORDER BY object_name;Run this query regularly. Any index with zero scans over a reasonable observation period (at least a few weeks, covering all workload patterns) is a candidate for removal. Dropping it will speed up writes and free storage.
Index Bloat
In PostgreSQL, indexes can become "bloated" - they grow larger than necessary because of dead tuples from updates and deletes. MVCC (Multi-Version Concurrency Control) keeps old row versions around until they're vacuumed. The index entries pointing to those dead rows remain until a maintenance operation cleans them up.
-- PostgreSQL: Check index bloat
-- Install pgstattuple extension
CREATE EXTENSION pgstattuple;
SELECT
indexrelid::regclass AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
avg_leaf_density,
leaf_fragmentation
FROM pgstatindex('idx_orders_user_id');
-- avg_leaf_density below 50% suggests significant bloat
-- leaf_fragmentation above 30% suggests the index needs rebuildingSymptoms of index bloat:
- Index size is disproportionately large relative to the table
- Queries that should use the index are slower than expected
VACUUMisn't keeping up with the rate of updates/deletes
The fix:
-- PostgreSQL: Rebuild index (locks the table - careful in production)
REINDEX INDEX idx_orders_user_id;
-- PostgreSQL: Rebuild without locking (available since PG 12)
REINDEX INDEX CONCURRENTLY idx_orders_user_id;
-- Better: Ensure autovacuum is tuned for your workload
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.02, -- Vacuum when 2% of rows are dead (default 20%)
autovacuum_analyze_scale_factor = 0.01 -- Analyze when 1% of rows change
);Memory Pressure - Index Doesn't Fit in RAM
An index only provides fast lookups when it's in memory (buffer cache). If the index is too large to fit in RAM, index scans trigger disk I/O - which can be slower than a sequential scan that benefits from OS-level read-ahead.
-- PostgreSQL: Check index sizes vs available memory
SELECT
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS times_used
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;
-- Compare against shared_buffers (your buffer cache size)
SHOW shared_buffers;
-- If your largest index is 8GB and shared_buffers is 4GB, that index
-- will constantly swap in and out of cacheReal impact: A 10GB index on a server with 4GB of buffer cache means every index lookup has a high probability of hitting disk. If you're doing 10,000 index lookups per second, that's potentially 10,000 random disk reads per second. Your SSD might handle it. A spinning disk will not.
Mitigation:
- Increase
shared_buffers(PostgreSQL) orinnodb_buffer_pool_size(MySQL). As a starting point, set these to 25-40% of total server RAM. - Use partial indexes to reduce index size - only index the rows you actually query.
- Remove unused indexes to free buffer cache space for indexes that are actually used.
- Consider the working set. If your queries only access the last 30 days of data, a partial index on
WHERE created_at > NOW() - INTERVAL '30 days'can be dramatically smaller than a full index.
How to Debug Index Problems - The Systematic Approach
Don't guess. Measure. Here's the debugging workflow for every slow query.
Step 1: Run EXPLAIN ANALYZE
This is the single most important tool for database performance debugging. It shows you exactly what the query planner chose to do and how long each step took.
-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders
WHERE user_id = 12345
AND status = 'shipped'
ORDER BY created_at DESC
LIMIT 10;What to look for in the output:
-- ✅ GOOD: Index Scan - the index is being used
Index Scan using idx_orders_user_status on orders
Index Cond: (user_id = 12345 AND status = 'shipped')
Rows Removed by Filter: 0
Buffers: shared hit=8
Planning Time: 0.2ms
Execution Time: 0.05ms
-- ❌ BAD: Seq Scan - full table scan, index ignored
Seq Scan on orders
Filter: (user_id = 12345 AND status = 'shipped')
Rows Removed by Filter: 9,999,990
Buffers: shared hit=425,000
Planning Time: 0.3ms
Execution Time: 3,200msKey indicators in EXPLAIN output:
| What you see | What it means |
|---|---|
Index Scan | Index is used, rows fetched from table |
Index Only Scan | Best case - all data comes from index, no table access |
Bitmap Index Scan | Index used, but results need a second pass to fetch rows |
Seq Scan | Full table scan - index not used |
Rows Removed by Filter | Rows that were fetched but didn't match - high numbers = problem |
Buffers: shared hit | Pages read from cache - higher is more I/O |
Planning Time vs Execution Time | If planning is high, statistics might be stale |
-- MySQL equivalent
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 12345
AND status = 'shipped'
ORDER BY created_at DESC
LIMIT 10;
-- Or the classic EXPLAIN (without ANALYZE)
EXPLAIN SELECT ...;
-- Look for:
-- type: ref, range, index → good (index is used)
-- type: ALL → bad (full table scan)
-- key: which index is being used (NULL = none)
-- rows: estimated rows to examineStep 2: Check Actual Index Usage
Don't just check if the index exists. Check if it's actually being used by your real workload.
-- PostgreSQL: Index usage statistics
SELECT
relname AS table_name,
indexrelname AS index_name,
idx_scan AS number_of_scans,
idx_tup_read AS rows_read_from_index,
idx_tup_fetch AS rows_fetched_from_table,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY idx_scan DESC;What this tells you:
idx_scan = 0: Index has never been used. Candidate for removal.idx_scanis low butindex_sizeis large: Expensive index with little benefit.idx_tup_read>>idx_tup_fetch: Many index entries read but few rows actually fetched - possible low-selectivity issue.
Step 3: Compare Before and After
Never deploy an index change based on a hunch. Measure the actual query performance before and after.
-- Capture baseline
\timing on -- PostgreSQL: enables timing
-- Run the slow query 3 times, note the average
SELECT * FROM orders WHERE user_id = 12345 AND created_at > '2024-01-01';
-- Time: 2,450ms, 2,380ms, 2,510ms → Average: ~2,450ms
-- Add the index
CREATE INDEX CONCURRENTLY idx_orders_user_created
ON orders(user_id, created_at);
-- Update statistics
ANALYZE orders;
-- Run the same query 3 times again
SELECT * FROM orders WHERE user_id = 12345 AND created_at > '2024-01-01';
-- Time: 1.2ms, 0.8ms, 0.9ms → Average: ~1.0ms
-- Verify with EXPLAIN that the index is being used
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 12345 AND created_at > '2024-01-01';
-- Should show: Index Scan using idx_orders_user_createdAlways use CREATE INDEX CONCURRENTLY in PostgreSQL for production databases. The regular CREATE INDEX locks the table for writes for the entire duration of index creation. On a 100-million-row table, that could be minutes of write downtime.
-- ❌ DON'T do this in production - locks the table
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
-- ✅ DO this - builds index without locking
CREATE INDEX CONCURRENTLY idx_orders_user_created ON orders(user_id, created_at);How to Fix Common Index Problems
Now that you can diagnose the issue, here are the fixes - from simple to advanced.
Fix 1: Rewrite Queries to Be Index-Friendly
Often the cheapest fix. No schema changes, no new indexes. Just rewrite the query so the planner can use existing indexes.
-- ❌ Function on column → index ignored
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
-- ✅ Store data in the right format, query without functions
-- (Ensure email is stored lowercase at write time)
SELECT * FROM users WHERE email = 'test@example.com';
-- ❌ Leading wildcard → index ignored
SELECT * FROM users WHERE name LIKE '%john%';
-- ✅ If prefix search is acceptable:
SELECT * FROM users WHERE name LIKE 'john%';
-- ❌ Implicit type cast → index ignored
SELECT * FROM users WHERE phone = 5551234567;
-- ✅ Match the column type:
SELECT * FROM users WHERE phone = '5551234567';
-- ❌ OR condition across columns → may prevent index usage
SELECT * FROM users WHERE user_id = 123 OR email = 'john@example.com';
-- ✅ UNION for guaranteed index usage on each branch:
SELECT * FROM users WHERE user_id = 123
UNION
SELECT * FROM users WHERE email = 'john@example.com';
-- ❌ NOT IN / NOT EXISTS with large lists → may scan
SELECT * FROM orders WHERE status NOT IN ('cancelled', 'refunded', 'archived');
-- ✅ Rewrite as positive condition if possible:
SELECT * FROM orders WHERE status IN ('pending', 'processing', 'shipped', 'delivered');Fix 2: Design Composite Indexes for Your Query Patterns
Don't create indexes in the abstract. Look at your actual queries and design indexes that serve them.
-- Your most common query:
SELECT * FROM orders
WHERE user_id = ? AND status = ?
ORDER BY created_at DESC
LIMIT 20;
-- The perfect index for this query:
CREATE INDEX idx_orders_user_status_created
ON orders(user_id, status, created_at DESC);
-- Why this order:
-- 1. user_id (equality) → narrows to one user's orders
-- 2. status (equality) → narrows to orders with specific status
-- 3. created_at DESC (ordering) → results are pre-sorted, no extra sort step
-- The planner can:
-- → Use the index to find matching rows (user_id + status)
-- → Return results in order (created_at DESC) without sorting
-- → Stop after 20 rows (LIMIT) without scanning furtherOne good composite index is better than three single-column indexes. The database can combine single-column indexes using bitmap scans, but it's slower and more resource-intensive than a single composite index tailored to the query.
Fix 3: Partial Indexes for Targeted Queries
If you only query a subset of rows, index only that subset. The index is smaller, faster, and uses less memory.
-- Only 2% of orders are 'pending', but you query them constantly
-- Full index on status - wastes space on 98% of rows you don't query
CREATE INDEX idx_orders_status ON orders(status); -- 500MB
-- Partial index - only the rows you care about
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending'; -- 10MB
-- This index serves:
SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at;
-- And it's 50x smaller than the full index.-- Partial index for "soft deleted" records
-- 99% of queries filter to non-deleted records
CREATE INDEX idx_users_active_email ON users(email)
WHERE deleted_at IS NULL;
-- Serves the common case efficiently:
SELECT * FROM users WHERE email = ? AND deleted_at IS NULL;Partial indexes are PostgreSQL's secret weapon. MySQL doesn't support them natively (though you can approximate with generated columns). If you're on PostgreSQL and not using partial indexes, you're leaving significant performance on the table.
Fix 4: Covering Indexes (Index-Only Scans)
A covering index includes all columns needed by the query. The database can answer the query entirely from the index without touching the table at all. This is called an "index-only scan" and it's the fastest possible read.
-- Query:
SELECT user_id, status, created_at FROM orders
WHERE user_id = 12345 AND status = 'shipped';
-- Regular index - requires table lookup for each matching row
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Flow: Index → find matching entries → go to table → read user_id, status, created_at
-- Covering index - includes created_at, no table lookup needed
CREATE INDEX idx_orders_covering ON orders(user_id, status, created_at);
-- Flow: Index → find matching entries → return data directly from index
-- (Index-Only Scan in EXPLAIN)
-- PostgreSQL: INCLUDE clause for covering indexes (PG 11+)
-- Better: Only the first columns are used for searching, included columns are just stored
CREATE INDEX idx_orders_covering ON orders(user_id, status) INCLUDE (created_at, total);
-- Searches by user_id + status, but also stores created_at and total in the index
-- Enables index-only scan without making the search tree widerWhen to use covering indexes: For queries that run thousands of times per second and only need a few columns. The elimination of table lookups can cut query time in half. But each included column makes the index larger, so be selective.
Fix 5: Remove Unused Indexes
This is an optimization most teams never do. They add indexes but never remove them. Over time, the table accumulates 10, 15, 20 indexes. Write performance degrades steadily. Nobody connects the slowdown to the index sprawl because it happened gradually.
-- PostgreSQL: Generate DROP statements for unused indexes
SELECT
'DROP INDEX CONCURRENTLY ' || indexrelname || ';' AS drop_command,
indexrelname AS index_name,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS size_freed
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey'
AND schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;Before dropping, verify:
- Reset statistics counter and wait at least 2 weeks (covering all workload patterns including monthly jobs).
- Check if the index is used by a rare but critical query (monthly reports, batch jobs).
- Drop with
CONCURRENTLYto avoid locking the table.
-- Reset statistics to start fresh observation
SELECT pg_stat_reset();
-- Wait 2-4 weeks...
-- Then check which indexes have still never been used
SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0;The Trade-Off Table
Every index decision has costs. Make them explicitly.
| Action | Read Benefit | Write Cost | Storage Cost | Complexity |
|---|---|---|---|---|
| Add single-column index | Faster lookups on that column | ~10-15% write overhead per index | Moderate | Low |
| Add composite index | Faster multi-column queries, possible sort elimination | ~10-15% write overhead | Larger than single | Medium - column order matters |
| Add covering index | Index-only scans - no table lookup | Same write overhead + larger index | Large - stores extra columns | Medium |
| Add partial index | Smaller, targeted, faster scans | Minimal - only indexes subset | Small | Medium - requires WHERE clause design |
| Add expression index | Enables index usage on function calls | Write overhead + function eval on insert | Moderate | Medium |
| Remove unused index | None (it wasn't being used) | Faster writes | Reclaimed space | Low - but verify it's truly unused |
When NOT to Use Indexes
Indexes are not universally good. Here are legitimate cases where you should avoid them.
Small tables. If a table has fewer than 1,000 rows, the database can scan it entirely in microseconds. An index adds overhead without measurable benefit. The planner will usually ignore the index and do a sequential scan anyway.
Write-heavy tables with few reads. Log tables, audit tables, event streams - if the write-to-read ratio is 100:1, every index is taxing the write path for the benefit of very few reads. Consider indexing only after data is moved to an analytical store, or use partial indexes that cover only the specific queries you do run.
Low-selectivity columns without partial index strategy. As covered above, indexing a boolean or status column with 3 values is usually pointless for full indexes. Use partial indexes if you need to query specific values, or skip the index entirely.
Columns only used in low-priority queries. A monthly report that takes 30 seconds is probably fine. Adding an index that slows down 50,000 writes/second to save 25 seconds on a monthly report is a bad trade.
Temporary or staging tables. Tables that are bulk-loaded and then queried once before being dropped don't benefit from indexes. The cost of building the index exceeds the cost of scanning the data once.
Production Checklist: Index Health
Run through this periodically. Monthly at minimum for production databases.
1. Find unused indexes and drop them. Unused indexes are pure cost - write overhead, storage, and memory wasted. Use pg_stat_user_indexes (PostgreSQL) or performance_schema.table_io_waits_summary_by_index_usage (MySQL) to identify them.
2. Find duplicate indexes. Composite index (A, B) makes a standalone index on (A) redundant. The composite index serves both WHERE A = ? and WHERE A = ? AND B = ? queries.
-- PostgreSQL: Find potentially redundant indexes
SELECT
a.indexrelid::regclass AS potentially_redundant,
b.indexrelid::regclass AS covers_it,
pg_size_pretty(pg_relation_size(a.indexrelid)) AS wasted_size
FROM pg_index a
JOIN pg_index b ON a.indrelid = b.indrelid
AND a.indexrelid != b.indexrelid
AND a.indkey::text = ANY(
SELECT string_to_array(b.indkey::text, ' ')::text[]
)
WHERE NOT a.indisprimary;3. Check index bloat and reindex if needed. Bloated indexes waste memory and slow scans. Monitor avg_leaf_density and rebuild when it drops below 50%.
4. Verify statistics are current. After any large data change (bulk import, mass update, partition rotation), run ANALYZE on affected tables.
5. Review EXPLAIN plans for your top 10 slowest queries. Find them in pg_stat_statements (PostgreSQL) or the slow query log (MySQL). Check if they're using indexes as expected.
-- PostgreSQL: Top 10 slowest queries by total time
SELECT
substring(query, 1, 100) AS query_preview,
calls,
round(total_exec_time::numeric, 2) AS total_time_ms,
round(mean_exec_time::numeric, 2) AS avg_time_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;Summary: What You Should Actually Do
Indexes don't magically fix performance. They only work when aligned with your query patterns.
If a query is slow despite having an index:
- Run
EXPLAIN ANALYZE. Look forSeq Scanwhere you expected an index scan. - Check for functions on indexed columns, leading wildcards, type mismatches, or wrong composite index column order.
- Verify table statistics are current with
ANALYZE. - Fix the query or the index - in that order. Rewriting the query is cheaper than adding another index.
If you're designing indexes for a new feature:
- Write the queries first. Then design indexes to serve those specific queries.
- Equality columns first, range columns last in composite indexes.
- Consider covering indexes for high-frequency queries that fetch few columns.
- Use partial indexes for queries that target a small subset of rows.
For ongoing index health:
- Monitor unused indexes and drop them. They're costing you write performance.
- Monitor index sizes relative to your buffer cache. Indexes that don't fit in memory lose most of their benefit.
- Run
EXPLAIN ANALYZEon your slowest queries monthly. Query patterns shift. Indexes that made sense six months ago might be irrelevant today.
The best index strategy isn't "index everything." It's indexing deliberately, measuring the impact, and removing what doesn't work. Treat indexes as living infrastructure that needs periodic review - not a one-time fix.

