.uTechUnfiltered  .dev
Databases#sql#performance#postgresql#mysql#query-optimization

Why Your Database Indexes Are Not Working (And How to Fix Them)

Raunak Gupta
Mar 22, 202630 min readUpdated 2 days ago
Share:

You added an index. Your query is still slow. Now what?

This is the most common database performance frustration in backend engineering. A query takes 3 seconds. You look at the WHERE clause, create an index on the relevant column, deploy, and check again. Still 3 seconds. Or worse - the query got slower. The EXPLAIN output looks like hieroglyphics. The query planner is ignoring your index entirely and doing a full table scan on 50 million rows.

The problem isn't indexes. The problem is the assumption that index = faster queries. That's only true when the index matches your query pattern, the query planner decides to use it, the data distribution makes it worthwhile, and the index fits in memory. Miss any one of those conditions, and your index is dead weight - consuming disk space, slowing down writes, and doing absolutely nothing for reads.

This article is about the debugging mindset. Why indexes fail in real systems, how to diagnose the problem, and how to fix it. Not "what is a B-tree." You can Google that.


The Mental Model: What an Index Actually Does

A 30-second refresher, because everything that follows depends on getting this right.

An index is a separate data structure that maps column values to row locations. When the database needs to find rows matching a condition, it can either scan the entire table (sequential scan) or look up matching values in the index and jump directly to those rows (index scan).

text
Without index (sequential scan):
Query: WHERE user_id = 12345

Table: Row 1 → check → no
       Row 2 → check → no
       Row 3 → check → no
       ... (50 million rows later)
       Row 38,291,003 → check → YES
       ... (continues scanning remaining rows)

Time: O(n) - checks every single row


With index (index scan):
Query: WHERE user_id = 12345

Index (B-tree):
       [1-25000] → [12000-12500] → [12345] → Row pointer

Table: Jump directly to Row 38,291,003

Time: O(log n) - a few tree traversals

The critical point most developers miss: The index exists as a separate structure. It has its own storage footprint, its own memory requirements, and it must be updated on every write to the indexed columns. Creating an index isn't free - it's a trade-off between read speed and write overhead.

An index that the query planner doesn't use is just wasted space and write penalty. This happens far more often than most developers realize.


Why Your Index Is Not Being Used

These are the actual reasons indexes get ignored in production. Not theoretical edge cases - these show up in real code reviews and production debugging sessions constantly.

1. Functions or Expressions on Indexed Columns

This is the most common reason indexes are silently ignored. If you apply a function to an indexed column in your WHERE clause, the database can't use the index on that column.

sql
-- Index on 'email' column
CREATE INDEX idx_users_email ON users(email);

-- ❌ THIS WILL NOT USE THE INDEX
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

-- ❌ THIS WILL NOT USE THE INDEX EITHER
SELECT * FROM users WHERE UPPER(email) = 'JOHN@EXAMPLE.COM';

-- ❌ ALSO BROKEN - any function wrapping the column
SELECT * FROM users WHERE TRIM(email) = 'john@example.com';
SELECT * FROM users WHERE CONCAT(first_name, ' ', last_name) = 'John Doe';
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
SELECT * FROM orders WHERE DATE(created_at) = '2024-01-15';

Why this happens: A B-tree index stores values in sorted order. The index on email contains sorted email strings. When you ask for LOWER(email), the database would need to apply LOWER() to every index entry to check if it matches - which is no better than a full table scan. So the query planner skips the index entirely.

The fix - expression indexes (PostgreSQL) or generated columns (MySQL):

sql
-- PostgreSQL: Create an index on the expression
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Now this query USES the index
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';


-- MySQL: Use a generated column + index
ALTER TABLE users ADD email_lower VARCHAR(255)
    GENERATED ALWAYS AS (LOWER(email)) STORED;
CREATE INDEX idx_users_email_lower ON users(email_lower);

-- Now query against the generated column
SELECT * FROM users WHERE email_lower = 'john@example.com';
sql
-- Common date function trap and fix
-- ❌ BROKEN: Function on column prevents index usage
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

-- ✅ FIXED: Range query uses index on created_at
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

-- ❌ BROKEN: DATE() function on column
SELECT * FROM orders WHERE DATE(created_at) = '2024-01-15';

-- ✅ FIXED: Range query on the original column
SELECT * FROM orders
WHERE created_at >= '2024-01-15' AND created_at < '2024-01-16';

This single fix - rewriting function calls as range queries - resolves more slow query issues than any other optimization. Audit every slow query for functions wrapping indexed columns. It's the first thing to check.


2. Leading Wildcards in LIKE Queries

A B-tree index sorts values left to right. It can quickly find all values starting with a prefix, but it cannot efficiently search for patterns in the middle or end of a string.

sql
-- Index on 'name' column
CREATE INDEX idx_users_name ON users(name);

-- ✅ THIS USES THE INDEX (prefix match)
SELECT * FROM users WHERE name LIKE 'John%';

-- ❌ THIS DOES NOT USE THE INDEX (leading wildcard)
SELECT * FROM users WHERE name LIKE '%John';

-- ❌ THIS DOES NOT USE THE INDEX EITHER (wildcard on both sides)
SELECT * FROM users WHERE name LIKE '%John%';

Why: Think of a phone book. You can quickly find everyone whose last name starts with "Sm" - just go to the "Sm" section. But finding everyone whose last name contains "oh" means reading every single entry. That's what a leading wildcard does to an index.

The fix depends on your use case:

For suffix searches, store the reversed value and use prefix matching:

sql
-- Need to find emails by domain? (@gmail.com)
ALTER TABLE users ADD email_reversed VARCHAR(255)
    GENERATED ALWAYS AS (REVERSE(email)) STORED;
CREATE INDEX idx_users_email_reversed ON users(email_reversed);

-- Search for @gmail.com → search for reversed prefix
SELECT * FROM users WHERE email_reversed LIKE REVERSE('%.gmail.com');
-- This becomes: WHERE email_reversed LIKE 'moc.liamg.%' → uses index

For substring searches at scale, use full-text search instead of LIKE:

sql
-- PostgreSQL: Full-text search with GIN index
CREATE INDEX idx_products_search ON products USING GIN(to_tsvector('english', name));

SELECT * FROM products
WHERE to_tsvector('english', name) @@ to_tsquery('english', 'wireless & headphones');

-- Or use trigram indexes for partial matching
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_users_name_trgm ON users USING GIN(name gin_trgm_ops);

-- Now LIKE '%john%' CAN use the trigram index
SELECT * FROM users WHERE name LIKE '%john%';

The honest take: If you're doing a lot of substring searches, LIKE '%term%' on a B-tree index is the wrong approach entirely. Move to full-text search (PostgreSQL tsvector, MySQL FULLTEXT, Elasticsearch) or trigram indexes. Don't fight the B-tree - use the right data structure.


3. Low Selectivity - The Index Isn't Worth Using

Selectivity measures how many distinct values a column has relative to the total number of rows. A column with high selectivity (many unique values) benefits from indexing. A column with low selectivity (few unique values) often doesn't.

sql
-- High selectivity - good index candidate
-- email: 10,000,000 unique values out of 10,000,000 rows
-- Selectivity: 1.0 (perfect)
CREATE INDEX idx_users_email ON users(email);  -- ✅ Useful

-- Low selectivity - bad index candidate
-- status: 3 unique values ('active', 'inactive', 'pending') out of 10,000,000 rows
-- Selectivity: 0.0000003 (terrible)
CREATE INDEX idx_users_status ON users(status);  -- ❌ Usually useless

-- Low selectivity - boolean column
-- is_verified: 2 unique values (true/false) out of 10,000,000 rows
CREATE INDEX idx_users_verified ON users(is_verified);  -- ❌ Waste of space

Why the database ignores low-selectivity indexes: If 60% of your users have status = 'active', an index scan for WHERE status = 'active' would return 6 million rows. The database has to read the index entries, then jump to each row in the table (random I/O). For 6 million rows, a sequential table scan is actually faster because sequential I/O is dramatically cheaper than random I/O.

The math:

text
10,000,000 row table
Query: WHERE status = 'active' (matches 6,000,000 rows = 60%)

Sequential scan: Read entire table sequentially
  → ~10,000,000 × sequential read cost = fast (sequential I/O is cheap)

Index scan: Look up 6,000,000 entries in index, then random-read each row
  → 6,000,000 × random read cost = SLOW (random I/O is 10-100x more expensive)

The query planner correctly chooses sequential scan.
Your index is ignored. This is the right behavior.

The rule of thumb: An index is typically useful when the query returns less than 10-15% of the total rows. Above that, the query planner often prefers a sequential scan. This threshold varies by database, storage type (SSD vs HDD), and data layout, but 10-15% is a reasonable starting assumption.

The fix - partial indexes (when you need the rare values):

sql
-- Instead of indexing all status values...
CREATE INDEX idx_users_status ON users(status);  -- ❌ Mostly useless

-- ...index only the values you actually query for
-- If you only query for 'pending' users (which are rare):
CREATE INDEX idx_users_pending ON users(status) WHERE status = 'pending';  -- ✅

-- PostgreSQL: Only 50,000 rows indexed instead of 10,000,000
-- Much smaller, much faster, actually gets used

4. Wrong Column Order in Composite Indexes

This trips up experienced developers too. A composite index on (A, B, C) is not the same as separate indexes on A, B, and C. The column order determines which queries the index can serve.

The leftmost prefix rule: A composite index can be used for queries that filter on the leftmost columns of the index, in order. It's like a phone book sorted by last name, then first name. You can look up by last name alone, or by last name + first name. But you can't look up by first name alone - the book isn't sorted that way.

sql
-- Composite index
CREATE INDEX idx_orders_user_status_date ON orders(user_id, status, created_at);

-- ✅ Uses the index (leftmost prefix: user_id)
SELECT * FROM orders WHERE user_id = 123;

-- ✅ Uses the index (leftmost prefix: user_id + status)
SELECT * FROM orders WHERE user_id = 123 AND status = 'shipped';

-- ✅ Uses the index (all three columns)
SELECT * FROM orders WHERE user_id = 123 AND status = 'shipped' AND created_at > '2024-01-01';

-- ❌ DOES NOT USE THE INDEX (missing user_id - the leftmost column)
SELECT * FROM orders WHERE status = 'shipped';

-- ❌ DOES NOT USE THE INDEX (skips status - breaks the prefix chain)
SELECT * FROM orders WHERE user_id = 123 AND created_at > '2024-01-01';
-- (Actually uses the index partially - only the user_id part)

-- ❌ DOES NOT USE THE INDEX (none of the leftmost columns)
SELECT * FROM orders WHERE created_at > '2024-01-01';

The last example is the classic mistake. A developer sees slow queries on created_at, notices the composite index includes created_at, and assumes it should be working. It's not, because created_at is the third column and the query doesn't filter on the first two.

How to design composite indexes correctly:

Put equality conditions first, range conditions last. Columns used with = should come before columns used with >, <, BETWEEN, or ORDER BY.

sql
-- Query pattern:
SELECT * FROM orders
WHERE user_id = ?          -- equality
  AND status = ?           -- equality
  AND created_at > ?       -- range
ORDER BY created_at DESC;

-- ✅ Best index for this query:
CREATE INDEX idx_orders_optimized ON orders(user_id, status, created_at);
-- Equality columns first (user_id, status), range column last (created_at)

-- ❌ Wrong order:
CREATE INDEX idx_orders_bad ON orders(created_at, user_id, status);
-- Range column first breaks the effectiveness for equality lookups

Put the highest-selectivity column first (when all conditions are equality). This narrows the search most quickly.

sql
-- If user_id has 1,000,000 unique values and status has 3:
CREATE INDEX idx_orders ON orders(user_id, status);     -- ✅ Better
CREATE INDEX idx_orders ON orders(status, user_id);     -- ❌ Worse
-- user_id narrows to ~10 rows, then status filters further
-- vs. status narrows to ~3,333,333 rows, then user_id filters - more work

5. Outdated Table Statistics

The query planner doesn't examine actual data to decide whether to use an index. It uses statistics - precomputed summaries of data distribution, cardinality, and value frequency. If these statistics are wrong, the planner makes wrong decisions.

How statistics go stale:

  • Bulk data imports that change the distribution
  • Mass updates or deletes
  • Tables that grow significantly between statistics refreshes
  • Partition additions or changes
sql
-- PostgreSQL: Check when statistics were last updated
SELECT
    relname AS table_name,
    last_autoanalyze,
    last_analyze,
    n_live_tup AS live_rows,
    n_dead_tup AS dead_rows
FROM pg_stat_user_tables
WHERE relname = 'orders';

-- If last_analyze is days old and you've done major data changes: problem.

-- PostgreSQL: Force statistics update
ANALYZE orders;

-- MySQL: Update statistics
ANALYZE TABLE orders;

-- MySQL: Check cardinality estimates (what the planner sees)
SHOW INDEX FROM orders;

Real scenario where this bites you: You have an orders table with 100,000 rows, mostly with status = 'completed'. The planner's statistics reflect this. Then you run a batch import of 5 million new orders, all with status = 'pending'. The planner still thinks pending is rare (based on old stats) and uses an index scan. But now pending is the majority - a sequential scan would be 10x faster.

sql
-- After bulk imports, always update statistics
LOAD DATA INFILE 'orders_import.csv' INTO TABLE orders;
ANALYZE TABLE orders;  -- ← Don't forget this

PostgreSQL auto-analyze runs periodically, but its default thresholds might not trigger quickly enough after large data changes. For critical tables, run ANALYZE explicitly after bulk operations.


6. Implicit Type Casting

A subtle one that's easy to miss. If your query compares a column to a value of a different type, the database may cast the column value, which functions the same as wrapping it in a function - index is bypassed.

sql
-- Column: phone_number VARCHAR(20)
-- Index on phone_number exists

-- ❌ BROKEN: Comparing varchar to integer forces casting
SELECT * FROM users WHERE phone_number = 5551234567;
-- Database does: WHERE CAST(phone_number AS INTEGER) = 5551234567
-- Function on column → index ignored

-- ✅ FIXED: Use the correct type
SELECT * FROM users WHERE phone_number = '5551234567';


-- Column: id BIGINT
-- ❌ BROKEN in some databases: Comparing bigint to string
SELECT * FROM users WHERE id = '12345';
-- Some databases cast the column, not the literal

-- ✅ FIXED: Match the type
SELECT * FROM users WHERE id = 12345;

This is especially common in application code where parameters from HTTP requests (always strings) are passed directly to queries without type conversion. Your ORM might handle this, or it might not. Check.


7. OR Conditions Prevent Index Usage

The OR operator can prevent the database from using an index, depending on the database and the specific query structure.

sql
-- Index on user_id, separate index on email
CREATE INDEX idx_users_user_id ON users(user_id);
CREATE INDEX idx_users_email ON users(email);

-- ❌ May not use either index effectively
SELECT * FROM users WHERE user_id = 123 OR email = 'john@example.com';
-- Some databases will bitmap-scan both indexes and merge
-- Others will fall back to sequential scan

-- ✅ REWRITE as UNION for guaranteed index usage
SELECT * FROM users WHERE user_id = 123
UNION
SELECT * FROM users WHERE email = 'john@example.com';
-- Each sub-query uses its own index, results are merged

PostgreSQL is generally smart about OR conditions and can use bitmap index scans to combine results from multiple indexes. MySQL is worse at this - it often falls back to a full scan on OR conditions across different columns. Test with EXPLAIN on your specific database.


Real-World Problems: When Indexes Hurt

Indexes aren't free. Every index you add has costs that show up in write performance, storage, and memory pressure. Here's where the pain appears.

Too Many Indexes = Slow Writes

Every INSERT, UPDATE, or DELETE on a table must update all indexes on that table. A table with 10 indexes means every write operation performs 10 additional index modifications.

Real numbers:

text
Benchmark: INSERT performance on a table with varying index counts
Table: 10 million rows, PostgreSQL 16, NVMe SSD

Indexes: 0  → 45,000 inserts/second
Indexes: 2  → 32,000 inserts/second  (-29%)
Indexes: 5  → 18,000 inserts/second  (-60%)
Indexes: 10 → 8,500 inserts/second   (-81%)
Indexes: 15 → 4,200 inserts/second   (-91%)

Those numbers are illustrative, but the pattern is real. Each index roughly adds a 10-20% write overhead relative to the base. For write-heavy tables (logging, events, metrics), unnecessary indexes are a direct performance tax.

How to find unused indexes:

sql
-- PostgreSQL: Find indexes that are never scanned
SELECT
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS times_index_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'  -- Don't drop primary keys
ORDER BY pg_relation_size(indexrelid) DESC;

-- MySQL: Check index usage statistics (MySQL 8.0+, performance_schema)
SELECT
    object_schema,
    object_name AS table_name,
    index_name,
    count_star AS rows_read_via_index
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
  AND count_star = 0
  AND object_schema = 'your_database'
ORDER BY object_name;

Run this query regularly. Any index with zero scans over a reasonable observation period (at least a few weeks, covering all workload patterns) is a candidate for removal. Dropping it will speed up writes and free storage.


Index Bloat

In PostgreSQL, indexes can become "bloated" - they grow larger than necessary because of dead tuples from updates and deletes. MVCC (Multi-Version Concurrency Control) keeps old row versions around until they're vacuumed. The index entries pointing to those dead rows remain until a maintenance operation cleans them up.

sql
-- PostgreSQL: Check index bloat
-- Install pgstattuple extension
CREATE EXTENSION pgstattuple;

SELECT
    indexrelid::regclass AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    avg_leaf_density,
    leaf_fragmentation
FROM pgstatindex('idx_orders_user_id');

-- avg_leaf_density below 50% suggests significant bloat
-- leaf_fragmentation above 30% suggests the index needs rebuilding

Symptoms of index bloat:

  • Index size is disproportionately large relative to the table
  • Queries that should use the index are slower than expected
  • VACUUM isn't keeping up with the rate of updates/deletes

The fix:

sql
-- PostgreSQL: Rebuild index (locks the table - careful in production)
REINDEX INDEX idx_orders_user_id;

-- PostgreSQL: Rebuild without locking (available since PG 12)
REINDEX INDEX CONCURRENTLY idx_orders_user_id;

-- Better: Ensure autovacuum is tuned for your workload
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.02,   -- Vacuum when 2% of rows are dead (default 20%)
    autovacuum_analyze_scale_factor = 0.01   -- Analyze when 1% of rows change
);

Memory Pressure - Index Doesn't Fit in RAM

An index only provides fast lookups when it's in memory (buffer cache). If the index is too large to fit in RAM, index scans trigger disk I/O - which can be slower than a sequential scan that benefits from OS-level read-ahead.

sql
-- PostgreSQL: Check index sizes vs available memory
SELECT
    indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan AS times_used
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;

-- Compare against shared_buffers (your buffer cache size)
SHOW shared_buffers;
-- If your largest index is 8GB and shared_buffers is 4GB, that index
-- will constantly swap in and out of cache

Real impact: A 10GB index on a server with 4GB of buffer cache means every index lookup has a high probability of hitting disk. If you're doing 10,000 index lookups per second, that's potentially 10,000 random disk reads per second. Your SSD might handle it. A spinning disk will not.

Mitigation:

  • Increase shared_buffers (PostgreSQL) or innodb_buffer_pool_size (MySQL). As a starting point, set these to 25-40% of total server RAM.
  • Use partial indexes to reduce index size - only index the rows you actually query.
  • Remove unused indexes to free buffer cache space for indexes that are actually used.
  • Consider the working set. If your queries only access the last 30 days of data, a partial index on WHERE created_at > NOW() - INTERVAL '30 days' can be dramatically smaller than a full index.

How to Debug Index Problems - The Systematic Approach

Don't guess. Measure. Here's the debugging workflow for every slow query.

Step 1: Run EXPLAIN ANALYZE

This is the single most important tool for database performance debugging. It shows you exactly what the query planner chose to do and how long each step took.

sql
-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders
WHERE user_id = 12345
  AND status = 'shipped'
ORDER BY created_at DESC
LIMIT 10;

What to look for in the output:

sql
-- ✅ GOOD: Index Scan - the index is being used
Index Scan using idx_orders_user_status on orders
  Index Cond: (user_id = 12345 AND status = 'shipped')
  Rows Removed by Filter: 0
  Buffers: shared hit=8
  Planning Time: 0.2ms
  Execution Time: 0.05ms

-- ❌ BAD: Seq Scan - full table scan, index ignored
Seq Scan on orders
  Filter: (user_id = 12345 AND status = 'shipped')
  Rows Removed by Filter: 9,999,990
  Buffers: shared hit=425,000
  Planning Time: 0.3ms
  Execution Time: 3,200ms

Key indicators in EXPLAIN output:

What you seeWhat it means
Index ScanIndex is used, rows fetched from table
Index Only ScanBest case - all data comes from index, no table access
Bitmap Index ScanIndex used, but results need a second pass to fetch rows
Seq ScanFull table scan - index not used
Rows Removed by FilterRows that were fetched but didn't match - high numbers = problem
Buffers: shared hitPages read from cache - higher is more I/O
Planning Time vs Execution TimeIf planning is high, statistics might be stale
sql
-- MySQL equivalent
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 12345
  AND status = 'shipped'
ORDER BY created_at DESC
LIMIT 10;

-- Or the classic EXPLAIN (without ANALYZE)
EXPLAIN SELECT ...;
-- Look for:
--   type: ref, range, index → good (index is used)
--   type: ALL → bad (full table scan)
--   key: which index is being used (NULL = none)
--   rows: estimated rows to examine

Step 2: Check Actual Index Usage

Don't just check if the index exists. Check if it's actually being used by your real workload.

sql
-- PostgreSQL: Index usage statistics
SELECT
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS number_of_scans,
    idx_tup_read AS rows_read_from_index,
    idx_tup_fetch AS rows_fetched_from_table,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY idx_scan DESC;

What this tells you:

  • idx_scan = 0: Index has never been used. Candidate for removal.
  • idx_scan is low but index_size is large: Expensive index with little benefit.
  • idx_tup_read >> idx_tup_fetch: Many index entries read but few rows actually fetched - possible low-selectivity issue.

Step 3: Compare Before and After

Never deploy an index change based on a hunch. Measure the actual query performance before and after.

sql
-- Capture baseline
\timing on  -- PostgreSQL: enables timing

-- Run the slow query 3 times, note the average
SELECT * FROM orders WHERE user_id = 12345 AND created_at > '2024-01-01';
-- Time: 2,450ms, 2,380ms, 2,510ms → Average: ~2,450ms

-- Add the index
CREATE INDEX CONCURRENTLY idx_orders_user_created
    ON orders(user_id, created_at);

-- Update statistics
ANALYZE orders;

-- Run the same query 3 times again
SELECT * FROM orders WHERE user_id = 12345 AND created_at > '2024-01-01';
-- Time: 1.2ms, 0.8ms, 0.9ms → Average: ~1.0ms

-- Verify with EXPLAIN that the index is being used
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 12345 AND created_at > '2024-01-01';
-- Should show: Index Scan using idx_orders_user_created

Always use CREATE INDEX CONCURRENTLY in PostgreSQL for production databases. The regular CREATE INDEX locks the table for writes for the entire duration of index creation. On a 100-million-row table, that could be minutes of write downtime.

sql
-- ❌ DON'T do this in production - locks the table
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

-- ✅ DO this - builds index without locking
CREATE INDEX CONCURRENTLY idx_orders_user_created ON orders(user_id, created_at);

How to Fix Common Index Problems

Now that you can diagnose the issue, here are the fixes - from simple to advanced.

Fix 1: Rewrite Queries to Be Index-Friendly

Often the cheapest fix. No schema changes, no new indexes. Just rewrite the query so the planner can use existing indexes.

sql
-- ❌ Function on column → index ignored
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
-- ✅ Store data in the right format, query without functions
-- (Ensure email is stored lowercase at write time)
SELECT * FROM users WHERE email = 'test@example.com';

-- ❌ Leading wildcard → index ignored
SELECT * FROM users WHERE name LIKE '%john%';
-- ✅ If prefix search is acceptable:
SELECT * FROM users WHERE name LIKE 'john%';

-- ❌ Implicit type cast → index ignored
SELECT * FROM users WHERE phone = 5551234567;
-- ✅ Match the column type:
SELECT * FROM users WHERE phone = '5551234567';

-- ❌ OR condition across columns → may prevent index usage
SELECT * FROM users WHERE user_id = 123 OR email = 'john@example.com';
-- ✅ UNION for guaranteed index usage on each branch:
SELECT * FROM users WHERE user_id = 123
UNION
SELECT * FROM users WHERE email = 'john@example.com';

-- ❌ NOT IN / NOT EXISTS with large lists → may scan
SELECT * FROM orders WHERE status NOT IN ('cancelled', 'refunded', 'archived');
-- ✅ Rewrite as positive condition if possible:
SELECT * FROM orders WHERE status IN ('pending', 'processing', 'shipped', 'delivered');

Fix 2: Design Composite Indexes for Your Query Patterns

Don't create indexes in the abstract. Look at your actual queries and design indexes that serve them.

sql
-- Your most common query:
SELECT * FROM orders
WHERE user_id = ? AND status = ?
ORDER BY created_at DESC
LIMIT 20;

-- The perfect index for this query:
CREATE INDEX idx_orders_user_status_created
    ON orders(user_id, status, created_at DESC);

-- Why this order:
-- 1. user_id (equality) → narrows to one user's orders
-- 2. status (equality) → narrows to orders with specific status
-- 3. created_at DESC (ordering) → results are pre-sorted, no extra sort step

-- The planner can:
-- → Use the index to find matching rows (user_id + status)
-- → Return results in order (created_at DESC) without sorting
-- → Stop after 20 rows (LIMIT) without scanning further

One good composite index is better than three single-column indexes. The database can combine single-column indexes using bitmap scans, but it's slower and more resource-intensive than a single composite index tailored to the query.

Fix 3: Partial Indexes for Targeted Queries

If you only query a subset of rows, index only that subset. The index is smaller, faster, and uses less memory.

sql
-- Only 2% of orders are 'pending', but you query them constantly
-- Full index on status - wastes space on 98% of rows you don't query
CREATE INDEX idx_orders_status ON orders(status);  -- 500MB

-- Partial index - only the rows you care about
CREATE INDEX idx_orders_pending ON orders(created_at)
    WHERE status = 'pending';  -- 10MB

-- This index serves:
SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at;

-- And it's 50x smaller than the full index.
sql
-- Partial index for "soft deleted" records
-- 99% of queries filter to non-deleted records
CREATE INDEX idx_users_active_email ON users(email)
    WHERE deleted_at IS NULL;

-- Serves the common case efficiently:
SELECT * FROM users WHERE email = ? AND deleted_at IS NULL;

Partial indexes are PostgreSQL's secret weapon. MySQL doesn't support them natively (though you can approximate with generated columns). If you're on PostgreSQL and not using partial indexes, you're leaving significant performance on the table.

Fix 4: Covering Indexes (Index-Only Scans)

A covering index includes all columns needed by the query. The database can answer the query entirely from the index without touching the table at all. This is called an "index-only scan" and it's the fastest possible read.

sql
-- Query:
SELECT user_id, status, created_at FROM orders
WHERE user_id = 12345 AND status = 'shipped';

-- Regular index - requires table lookup for each matching row
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Flow: Index → find matching entries → go to table → read user_id, status, created_at

-- Covering index - includes created_at, no table lookup needed
CREATE INDEX idx_orders_covering ON orders(user_id, status, created_at);
-- Flow: Index → find matching entries → return data directly from index
-- (Index-Only Scan in EXPLAIN)


-- PostgreSQL: INCLUDE clause for covering indexes (PG 11+)
-- Better: Only the first columns are used for searching, included columns are just stored
CREATE INDEX idx_orders_covering ON orders(user_id, status) INCLUDE (created_at, total);
-- Searches by user_id + status, but also stores created_at and total in the index
-- Enables index-only scan without making the search tree wider

When to use covering indexes: For queries that run thousands of times per second and only need a few columns. The elimination of table lookups can cut query time in half. But each included column makes the index larger, so be selective.

Fix 5: Remove Unused Indexes

This is an optimization most teams never do. They add indexes but never remove them. Over time, the table accumulates 10, 15, 20 indexes. Write performance degrades steadily. Nobody connects the slowdown to the index sprawl because it happened gradually.

sql
-- PostgreSQL: Generate DROP statements for unused indexes
SELECT
    'DROP INDEX CONCURRENTLY ' || indexrelname || ';' AS drop_command,
    indexrelname AS index_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size_freed
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
  AND schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

Before dropping, verify:

  1. Reset statistics counter and wait at least 2 weeks (covering all workload patterns including monthly jobs).
  2. Check if the index is used by a rare but critical query (monthly reports, batch jobs).
  3. Drop with CONCURRENTLY to avoid locking the table.
sql
-- Reset statistics to start fresh observation
SELECT pg_stat_reset();

-- Wait 2-4 weeks...

-- Then check which indexes have still never been used
SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0;

The Trade-Off Table

Every index decision has costs. Make them explicitly.

ActionRead BenefitWrite CostStorage CostComplexity
Add single-column indexFaster lookups on that column~10-15% write overhead per indexModerateLow
Add composite indexFaster multi-column queries, possible sort elimination~10-15% write overheadLarger than singleMedium - column order matters
Add covering indexIndex-only scans - no table lookupSame write overhead + larger indexLarge - stores extra columnsMedium
Add partial indexSmaller, targeted, faster scansMinimal - only indexes subsetSmallMedium - requires WHERE clause design
Add expression indexEnables index usage on function callsWrite overhead + function eval on insertModerateMedium
Remove unused indexNone (it wasn't being used)Faster writesReclaimed spaceLow - but verify it's truly unused

When NOT to Use Indexes

Indexes are not universally good. Here are legitimate cases where you should avoid them.

Small tables. If a table has fewer than 1,000 rows, the database can scan it entirely in microseconds. An index adds overhead without measurable benefit. The planner will usually ignore the index and do a sequential scan anyway.

Write-heavy tables with few reads. Log tables, audit tables, event streams - if the write-to-read ratio is 100:1, every index is taxing the write path for the benefit of very few reads. Consider indexing only after data is moved to an analytical store, or use partial indexes that cover only the specific queries you do run.

Low-selectivity columns without partial index strategy. As covered above, indexing a boolean or status column with 3 values is usually pointless for full indexes. Use partial indexes if you need to query specific values, or skip the index entirely.

Columns only used in low-priority queries. A monthly report that takes 30 seconds is probably fine. Adding an index that slows down 50,000 writes/second to save 25 seconds on a monthly report is a bad trade.

Temporary or staging tables. Tables that are bulk-loaded and then queried once before being dropped don't benefit from indexes. The cost of building the index exceeds the cost of scanning the data once.


Production Checklist: Index Health

Run through this periodically. Monthly at minimum for production databases.

1. Find unused indexes and drop them. Unused indexes are pure cost - write overhead, storage, and memory wasted. Use pg_stat_user_indexes (PostgreSQL) or performance_schema.table_io_waits_summary_by_index_usage (MySQL) to identify them.

2. Find duplicate indexes. Composite index (A, B) makes a standalone index on (A) redundant. The composite index serves both WHERE A = ? and WHERE A = ? AND B = ? queries.

sql
-- PostgreSQL: Find potentially redundant indexes
SELECT
    a.indexrelid::regclass AS potentially_redundant,
    b.indexrelid::regclass AS covers_it,
    pg_size_pretty(pg_relation_size(a.indexrelid)) AS wasted_size
FROM pg_index a
JOIN pg_index b ON a.indrelid = b.indrelid
    AND a.indexrelid != b.indexrelid
    AND a.indkey::text = ANY(
        SELECT string_to_array(b.indkey::text, ' ')::text[]
    )
WHERE NOT a.indisprimary;

3. Check index bloat and reindex if needed. Bloated indexes waste memory and slow scans. Monitor avg_leaf_density and rebuild when it drops below 50%.

4. Verify statistics are current. After any large data change (bulk import, mass update, partition rotation), run ANALYZE on affected tables.

5. Review EXPLAIN plans for your top 10 slowest queries. Find them in pg_stat_statements (PostgreSQL) or the slow query log (MySQL). Check if they're using indexes as expected.

sql
-- PostgreSQL: Top 10 slowest queries by total time
SELECT
    substring(query, 1, 100) AS query_preview,
    calls,
    round(total_exec_time::numeric, 2) AS total_time_ms,
    round(mean_exec_time::numeric, 2) AS avg_time_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Summary: What You Should Actually Do

Indexes don't magically fix performance. They only work when aligned with your query patterns.

If a query is slow despite having an index:

  1. Run EXPLAIN ANALYZE. Look for Seq Scan where you expected an index scan.
  2. Check for functions on indexed columns, leading wildcards, type mismatches, or wrong composite index column order.
  3. Verify table statistics are current with ANALYZE.
  4. Fix the query or the index - in that order. Rewriting the query is cheaper than adding another index.

If you're designing indexes for a new feature:

  1. Write the queries first. Then design indexes to serve those specific queries.
  2. Equality columns first, range columns last in composite indexes.
  3. Consider covering indexes for high-frequency queries that fetch few columns.
  4. Use partial indexes for queries that target a small subset of rows.

For ongoing index health:

  1. Monitor unused indexes and drop them. They're costing you write performance.
  2. Monitor index sizes relative to your buffer cache. Indexes that don't fit in memory lose most of their benefit.
  3. Run EXPLAIN ANALYZE on your slowest queries monthly. Query patterns shift. Indexes that made sense six months ago might be irrelevant today.

The best index strategy isn't "index everything." It's indexing deliberately, measuring the impact, and removing what doesn't work. Treat indexes as living infrastructure that needs periodic review - not a one-time fix.


Share:
R

Written by

Raunak Gupta

DevOps engineer and technical writer with experience in cloud infrastructure, CI/CD pipelines, and system design. Passionate about making complex engineering topics accessible through clear, practical writing backed by real production experience.

Previous

SQS vs Kafka: When to Use What in Real Systems

Related Articles