Caching Strategies That Work (And When They Fail)

Caching gives you a 50x speed boost in exchange for an unknown amount of stale data. The whole job of a caching layer is making that exchange rate worth it. Get it wrong and you'll find out in the support inbox, not the metrics dashboard.

Every backend engineer has the same story. System is slow. You add Redis in front of the database. Response times drop from 200ms to 5ms. You celebrate. Then three weeks later, a customer sees a price that was updated an hour ago. Or inventory shows "in stock" for something that sold out. Or two users see different data for the same resource depending on which server handles the request.

Caching is not the problem. Poorly designed caching is.And the gap between "add Redis" and "build a caching layer that works in production" is enormous.

This article is not an introduction to caching. It's about the real decisions you'll face: which strategy to use, when each one breaks, how to handle invalidation without losing your mind, and the failure modes that show up only at scale.

TL;DR

Caching turns 200ms queries into 5ms ones — and serves stale data the moment your invalidation strategy is wrong. The hard parts aren't the strategies themselves (cache-aside, write-through, write-behind, read-through); they're the failure modes: cache stampedes, hot keys, stale reads after writes, and the inevitable choice between consistency and speed. This article covers when each strategy fits, what breaks them at scale, and the cases where "just add Redis" is the worst possible advice.

Why "Just Add a Cache" Is Dangerous Advice

The caching pitch is seductive. Database query takes 150ms? Cache the result, serve it in 2ms. Problem solved.

Here's what actually happens next:

Stale data.The database gets updated, but the cache still holds the old value. Users see outdated information. For a blog post, nobody cares. For inventory counts, pricing, or account balances - it's a production incident.

Cache invalidation complexity. You now need to figure out when to update the cache. Every write path needs to know about the cache. Miss one, and you have inconsistency. Your simple caching layer becomes a distributed consistency problem.

Inconsistent reads.With multiple application instances, some might have local caches that are stale while others have fresh data. User A sees one thing, user B sees another. This is brutal to debug because it's intermittent and depends on which server handles the request.

The hidden contract.When you add a cache, you've made an implicit decision: you're trading consistency for speed.Every caching strategy is a different point on that trade-off curve. If you don't choose deliberately, production will choose for you - usually at the worst possible time.

Types of Caching - Where the Data Lives

Before choosing a strategy, understand the layers available to you. Each has different latency characteristics, capacity, and invalidation complexity.

In-Memory (Application Level)

Data lives in the application process's memory. A dictionary, a hash map, a local LRU cache.

Latency:~0.001ms (nanoseconds - it's a memory lookup)
Capacity: Limited by application memory. Usually tens of megabytes at most.
Invalidation: Only the local instance knows about it. Other instances have no idea.

Simple in-memory cache with TTL

# Simple in-memory cache with TTL
from functools import lru_cache
from cachetools import TTLCache

# Option 1: Python's built-in (no TTL, no size limit awareness)
@lru_cache(maxsize=1024)
def get_user(user_id):
    return db.query(f"SELECT * FROM users WHERE id = {user_id}")

# Option 2: TTL-based (better for production)
cache = TTLCache(maxsize=1024, ttl=300)  # 5-minute TTL

def get_user(user_id):
    if user_id in cache:
        return cache[user_id]
    user = db.query_user(user_id)
    cache[user_id] = user
    return user

When it works: High-read, low-change data. Configuration values, feature flags, reference data that changes once a day.

When it breaks: Multiple application instances. Each maintains its own cache. Update one, and the others serve stale data until their TTL expires. With 10 instances and a 5-minute TTL, you can have up to 5 minutes of inconsistency across servers.

Distributed Cache (Redis / Memcached)

Data lives in a dedicated cache server shared by all application instances.

Latency: 0.5–2ms (network round-trip)
Capacity:Gigabytes to terabytes. Limited by the cache cluster's memory.
Invalidation: Centralized. Update once, all instances see the change immediately.

Redis vs. Memcached - the actual difference:

Feature	Redis	Memcached
Data structures	Strings, hashes, lists, sets, sorted sets	Strings only
Persistence	Optional RDB/AOF	None
Replication	Built-in primary/replica	None
Memory efficiency	Less efficient (overhead per key)	More efficient for simple key-value
Eviction	Multiple policies (LRU, LFU, TTL, etc.)	LRU only

My take:Use Redis unless you're caching massive volumes of simple key-value pairs and need maximum memory efficiency. Redis's data structures and built-in features save enough development time to justify the overhead.

CDN / Edge Caching

Data lives on servers geographically close to the user. Cloudflare, AWS CloudFront, Fastly, etc.

Latency: 1–20ms (depending on proximity to edge node)
Capacity: Effectively unlimited (distributed across global PoPs)
Invalidation: Slow. Purging a CDN cache can take seconds to minutes. Some providers charge per purge request.

When it works: Static assets (images, CSS, JS), public API responses that are the same for all users, marketing pages.

When it breaks: Personalized content, frequently updated data, anything that varies per user. Setting wrong Cache-Controlheaders on authenticated API responses is a security incident - one user's data served to another.

Database Query Cache

The database itself caches query results. MySQL had a built-in query cache (removed in 8.0 because it caused more problems than it solved). PostgreSQL doesn't cache query results but caches execution plans and buffer pages.

Latency: Varies. Buffer cache hits are fast. Query cache hits skip parsing and execution entirely.
Capacity: Managed by the database engine.
Invalidation: Automatic - any write to a cached table invalidates relevant entries.

When it works: Read-heavy workloads with large, complex queries that return the same results frequently.

When it breaks:Write-heavy workloads. MySQL's query cache invalidated the entire table's cached results on any write to that table. One insert killed every cached query for that table. This is why MySQL removed it - under write-heavy loads, the cache was a net negative.

Bottom line:Don't rely on database-level caching as your primary strategy. Treat it as a bonus, not a plan.

Caching Strategies That Actually Work in Production

This is where the real decisions happen. Each strategy defines how data flows between your application, cache, and database.

Strategy 1: Cache-Aside (Lazy Loading)

The most widely used pattern, codified in Microsoft's Cache-Aside Pattern reference. Your application manages the cache explicitly.

Flow:

plaintext

Read path:
1. Application checks cache
2. Cache hit → return cached data
3. Cache miss → query database → store result in cache → return data

Write path:
1. Application writes to database
2. Application invalidates (deletes) the cache entry

Cache-aside pattern

def get_product(product_id):
    # Step 1: Check cache
    cached = redis.get(f"product:{product_id}")
    if cached:
        return json.loads(cached)

    # Step 2: Cache miss - hit database
    product = db.query("SELECT * FROM products WHERE id = %s", product_id)

    # Step 3: Populate cache for next time
    redis.setex(f"product:{product_id}", 3600, json.dumps(product))  # 1hr TTL

    return product

def update_product(product_id, data):
    # Step 1: Update database (source of truth)
    db.execute("UPDATE products SET ... WHERE id = %s", product_id)

    # Step 2: Invalidate cache (NOT update - delete)
    redis.delete(f"product:{product_id}")

Why delete instead of update the cache? Because delete is idempotent and simpler. If you try to update the cache with the new value, you introduce a race condition: two concurrent updates might write to the database in order A→B but update the cache in order B→A. Now the cache holds stale data permanently. Deleting the cache means the next read will fetch the latest data from the database.

When it works: Read-heavy workloads where a cache miss (hitting the database) is acceptable. This covers the majority of web applications.

When it breaks:

Cold cache problem. After a deployment or Redis restart, every request is a cache miss. Your database gets hammered with 100% of read traffic simultaneously. This can take down a database that was sized to handle 10% of reads (the remaining 90% normally served by cache).
Stale data window.Between a database write and the cache delete, there's a brief window where the cache holds old data. In practice, this window is milliseconds and rarely matters. But for financial data or inventory counts, "rarely" isn't good enough.
Write-heavy data.If data changes frequently, the cache is constantly being invalidated. Your cache hit rate drops, and you're paying the overhead of Redis calls without the benefit.

Strategy 2: Write-Through

Every write goes to both the database and the cache simultaneously. The cache is always consistent with the database.

Flow:

plaintext

Write path:
1. Application writes to database
2. Application immediately writes the same data to cache
3. Both succeed → write is complete

Read path:
1. Always read from cache (it's guaranteed to be current)
2. Cache miss (only after eviction) → query database → populate cache

Write-through pattern

def update_product(product_id, data):
    # Write to database
    db.execute("UPDATE products SET ... WHERE id = %s", product_id)

    # Write to cache immediately
    product = db.query("SELECT * FROM products WHERE id = %s", product_id)
    redis.setex(f"product:{product_id}", 3600, json.dumps(product))

def get_product(product_id):
    cached = redis.get(f"product:{product_id}")
    if cached:
        return json.loads(cached)

    # Cache miss (rare - only after TTL expiry or eviction)
    product = db.query("SELECT * FROM products WHERE id = %s", product_id)
    redis.setex(f"product:{product_id}", 3600, json.dumps(product))
    return product

When it works: Systems where consistency between cache and database is critical. User sessions, authentication tokens, shopping carts - data that changes frequently and must be read accurately.

When it breaks:

Slower writes.Every write operation now includes a cache write. That's an extra 1–2ms per write. At 10,000 writes/second, that adds up.
Wasted cache space.You're caching every piece of written data, even if nobody reads it. If you have a write-heavy table where most rows are written once and rarely read, you're filling your cache with data nobody needs.
Failure coupling.If Redis is down, do you fail the write? If you skip the cache write, you're back to cache-aside with inconsistency. If you fail the entire operation, your cache dependency has become a write-path dependency - Redis downtime means write outages.

Strategy 3: Write-Behind (Write-Back)

The most aggressive caching strategy. Writes go to the cache first, and the database is updated asynchronously in the background.

Flow:

plaintext

Write path:
1. Application writes to cache
2. Return success to the client immediately
3. Background process flushes cache writes to database (batched, async)

Read path:
1. Always read from cache (it has the latest data)

Write-behind with background worker

# Simplified write-behind with a background worker
import threading
import queue

write_queue = queue.Queue()

def update_product(product_id, data):
    # Write to cache immediately - this is the "source of truth" temporarily
    redis.setex(f"product:{product_id}", 7200, json.dumps(data))

    # Queue the database write for async processing
    write_queue.put(("product", product_id, data))

    # Return to client - no database latency
    return {"status": "ok"}

def db_writer_worker():
    """Background thread that flushes writes to the database in batches."""
    batch = []
    while True:
        try:
            item = write_queue.get(timeout=1.0)
            batch.append(item)

            # Flush when batch is large enough or queue is empty
            if len(batch) >= 50 or write_queue.empty():
                flush_batch_to_db(batch)
                batch = []
        except queue.Empty:
            if batch:
                flush_batch_to_db(batch)
                batch = []

When it works: High-write-throughput systems where write latency matters more than durability. Gaming leaderboards, activity feeds, analytics counters, real-time dashboards. Anything where losing the last few seconds of data during a crash is acceptable.

When it breaks - and this is important:

Data loss on failure.If Redis crashes before the background process flushes to the database, those writes are gone. Not "delayed" - gone. For any system where data loss is unacceptable (payments, orders, financial transactions), write-behind is the wrong strategy. Full stop.
Ordering complexity. Batch writes can arrive at the database out of order. If user A updates a record, then user B updates the same record, the batch processor needs to apply them in the correct order. This gets complicated fast.
Debugging nightmares.The database is always behind the cache. If you query the database directly (for analytics, admin tools, debugging), you're seeing stale data. This confuses everyone.

My recommendation:Use write-behind only when you've explicitly accepted the data loss risk and the performance benefit justifies it. For most applications, write-through or cache-aside is the safer choice.

Strategy 4: TTL-Based Caching (Time-To-Live)

Not a standalone strategy - more of a parameter applied to other strategies. Every cached value gets an expiration time. After the TTL expires, the entry is evicted, and the next read fetches fresh data from the database.

TTL examples by data type

# TTL examples by data type
TTL_CONFIG = {
    "user_profile":    300,    # 5 minutes - changes occasionally
    "product_listing": 60,     # 1 minute - prices can change
    "feature_flags":   30,     # 30 seconds - needs to propagate quickly
    "static_config":   3600,   # 1 hour - rarely changes
    "search_results":  120,    # 2 minutes - acceptable staleness
    "session_data":    1800,   # 30 minutes - security consideration
}

def get_with_ttl(key, fetch_fn, ttl):
    cached = redis.get(key)
    if cached:
        return json.loads(cached)

    fresh = fetch_fn()
    redis.setex(key, ttl, json.dumps(fresh))
    return fresh

The appeal of TTL:It's simple. No complex invalidation logic. Set it and forget it. The cache self-heals over time as entries expire and get repopulated with fresh data.

Here's the problem:TTL is a guess. You're saying "this data is probably still valid for N seconds." That "probably" is doing a lot of heavy lifting.

Too short:High cache miss rate. You're hitting the database frequently, negating the benefit of caching. A 5-second TTL on a value that changes once a day means 17,280 unnecessary cache misses per day per key.

Too long: Stale data. A 1-hour TTL on product pricing means customers could see outdated prices for up to an hour after a change.

The honest truth about TTL: It works well as a safety net on top ofactive invalidation. Set a TTL as a backstop, but don't rely on it as your primary invalidation mechanism for data that matters.

Where Caching Fails - The Hard Problems

This is the section that matters most. Every caching tutorial shows you how to set a value in Redis. Almost none talk about what happens when things go wrong.

1. Cache Invalidation - The Hardest Problem in Computer Science

Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. He wasn't joking about the first one.

The core question: When data changes in the database, how do you ensure the cache reflects that change?

Approach A: Invalidation on write (event-driven)

Every write operation explicitly deletes or updates the corresponding cache entries.

Invalidation on write

def update_user_email(user_id, new_email):
    db.execute("UPDATE users SET email = %s WHERE id = %s", new_email, user_id)

    # Invalidate all cache entries that contain this user's data
    redis.delete(f"user:{user_id}")
    redis.delete(f"user_profile:{user_id}")
    redis.delete(f"user_settings:{user_id}")

    # What about cached API responses that include this user?
    # What about search results that show this user's email?
    # What about the team members list that includes this user?
    # ...this is where it gets ugly.

The problem:You need to know every cache key that's affected by every database write. In a complex system, a single row update might affect dozens of cached views. Miss one, and you have stale data. Add a new feature that caches a new view? You need to update every write path that could affect it. This coupling between write paths and cache keys is fragile and scales poorly.

Approach B: TTL-based (passive expiration)

Don't invalidate explicitly. Let entries expire naturally.

The problem:Already covered. You're accepting a staleness window equal to the TTL.

Approach C: Event-driven invalidation via change data capture (CDC)

The database emits change events (via binlog, WAL, or a CDC tool like Debezium). A consumer listens to these events and invalidates the relevant cache entries.

plaintext

Database write → Binlog event → Debezium → Kafka → Cache invalidation consumer → Redis DELETE

This is the most robust approach for complex systems. The write path doesn't need to know about cache keys. The invalidation consumer owns the mapping between database changes and cache entries. Adding a new cache? Update the consumer. The write path is untouched.

The trade-off:Added infrastructure complexity. You're now running Kafka (or equivalent) and a consumer service just for cache invalidation. For small systems, this is overkill. For large systems with dozens of cache keys affected by a single write, it's worth it.

2. Stale Data in Critical Systems

Not all stale data is equal. A blog post showing the wrong author bio for 30 seconds? Nobody notices. An e-commerce site showing "in stock" for a sold-out item? That's a customer support nightmare and potentially a legal issue.

Systems where stale cache data is dangerous:

Inventory counts. Overselling because the cache showed available stock.
Pricing. Charging the wrong amount. This can violate consumer protection laws in some jurisdictions.
Account balances. Showing incorrect available funds.
Permissions and access control.A revoked user retaining access because the permissions cache hasn't updated.
Rate limiting (covered in a previous article). Stale counters letting users exceed their limits.

The rule:For any data where staleness has financial, legal, or security implications, either bypass the cache entirely or use write-through with synchronous invalidation. No TTL-only strategies. No "eventual consistency is fine."

Critical vs. non-critical data

# Critical data - bypass cache, read from database directly
def get_account_balance(account_id):
    # NO CACHE. Source of truth only.
    return db.query("SELECT balance FROM accounts WHERE id = %s", account_id)

# Non-critical data - cache is fine
def get_account_display_name(account_id):
    return get_with_ttl(f"account_name:{account_id}",
                        lambda: db.query_display_name(account_id),
                        ttl=300)

Don't be clever about caching data that can't be stale. The 2ms latency win isn't worth the production incident.

3. Cache Stampede (Thundering Herd)

One of the nastiest failure modes in caching. A popular cache entry expires, and hundreds of concurrent requests all see the cache miss simultaneously. All of them query the database at the same time.

The timeline:

plaintext

T=0:    Cache entry for "popular_product:123" expires
T=0.001: Request A → cache miss → queries database
T=0.002: Request B → cache miss → queries database
T=0.003: Request C → cache miss → queries database
...
T=0.010: 500 requests all querying the database for the same row
T=0.150: Database response time spikes from 5ms to 500ms
T=0.200: Downstream services start timing out

At 50,000 requests/second to a popular endpoint, a single expired cache key can generate thousands of simultaneous database queries. If that query is expensive (joins, aggregations), this can take down your database.

Solution 1: Locking (Mutex)

Only the first request that encounters a cache miss queries the database. All other requests wait for the first one to populate the cache.

Cache stampede prevention - locking

def get_with_lock(key, fetch_fn, ttl):
    # Try cache first
    cached = redis.get(key)
    if cached:
        return json.loads(cached)

    # Try to acquire lock
    lock_key = f"lock:{key}"
    acquired = redis.set(lock_key, "1", nx=True, ex=5)  # 5-second lock TTL

    if acquired:
        try:
            # This request fetches from DB and populates cache
            fresh = fetch_fn()
            redis.setex(key, ttl, json.dumps(fresh))
            return fresh
        finally:
            redis.delete(lock_key)
    else:
        # Another request is fetching. Wait and retry from cache.
        time.sleep(0.05)  # 50ms backoff
        cached = redis.get(key)
        if cached:
            return json.loads(cached)

        # If still no cache, fall through to DB (prevents deadlock)
        return fetch_fn()

Solution 2: Request Coalescing (Single Flight)

Similar to locking, but at the application level. Group identical in-flight requests and execute the database query only once. All waiting requests get the same result.

Single flight pattern

# Python implementation using asyncio
import asyncio
from collections import defaultdict

class SingleFlight:
    def __init__(self):
        self._in_flight = {}  # key → Future

    async def do(self, key, fetch_fn):
        if key in self._in_flight:
            # Another request is already fetching this key. Wait for it.
            return await self._in_flight[key]

        # This is the first request. Create a future and fetch.
        future = asyncio.get_event_loop().create_future()
        self._in_flight[key] = future

        try:
            result = await fetch_fn()
            future.set_result(result)
            return result
        except Exception as e:
            future.set_exception(e)
            raise
        finally:
            del self._in_flight[key]

# Usage
flight = SingleFlight()

async def get_product(product_id):
    cached = await redis.get(f"product:{product_id}")
    if cached:
        return json.loads(cached)

    # Only one request will actually hit the database
    product = await flight.do(
        f"product:{product_id}",
        lambda: db.query_product(product_id)
    )

    await redis.setex(f"product:{product_id}", 3600, json.dumps(product))
    return product

Solution 3: Probabilistic Early Expiration

Refresh the cache before it expires. Each request that reads a cache value has a small, increasing probability of triggering a background refresh as the TTL approaches expiration.

XFetch - probabilistic early refresh

import random
import time

def get_with_early_refresh(key, fetch_fn, ttl, beta=1.0):
    """
    XFetch algorithm - probabilistic early refresh.
    As TTL approaches, probability of refresh increases.
    """
    cached = redis.get(key)
    if cached:
        data = json.loads(cached)
        remaining_ttl = redis.ttl(key)

        # Calculate refresh probability
        # Higher beta = more aggressive early refresh
        # As remaining_ttl approaches 0, probability approaches 1
        if remaining_ttl > 0:
            probability = max(0, 1 - (remaining_ttl / ttl)) * beta
            if random.random() < probability:
                # Trigger background refresh
                refresh_in_background(key, fetch_fn, ttl)

        return data

    # True cache miss
    fresh = fetch_fn()
    redis.setex(key, ttl, json.dumps(fresh))
    return fresh

This eliminates the stampede entirely because the cache is refreshed before it expires. No mass cache miss, no thundering herd.

4. Hot Keys - When One Key Gets All the Traffic

In any system, some data is accessed far more than other data. The homepage product. A viral tweet. A popular user's profile. These "hot keys" concentrate traffic onto a single Redis key.

Why this matters with Redis Cluster:Redis Cluster shards data by key hash. A hot key means one shard handles disproportionate traffic while others sit idle. One shard's CPU spikes. Latency increases for all keys on that shard - not just the hot one.

Real numbers:If you have 16 Redis shards and one key gets 50% of your traffic, that shard is handling 8x the expected load. Your "horizontally scaled" cache has a single point of contention.

Mitigation strategies:

Local caching with short TTL. Cache hot keys in application memory. Even a 1-second local TTL can absorb thousands of requests per instance.

Local cache for hot keys

from cachetools import TTLCache

# Local cache - 1 second TTL, absorbs per-instance traffic
local_cache = TTLCache(maxsize=100, ttl=1)

def get_hot_data(key):
    if key in local_cache:
        return local_cache[key]

    # Fall through to Redis
    cached = redis.get(key)
    if cached:
        data = json.loads(cached)
        local_cache[key] = data
        return data

    data = fetch_from_db(key)
    redis.setex(key, 60, json.dumps(data))
    local_cache[key] = data
    return data

Key replication / fan-out. Store the same value under multiple keys with random suffixes. Distribute reads across the copies.

Key replication for hot keys

import random

NUM_REPLICAS = 5

def set_hot_key(base_key, value, ttl):
    """Write to all replicas."""
    for i in range(NUM_REPLICAS):
        redis.setex(f"{base_key}:r{i}", ttl, value)

def get_hot_key(base_key):
    """Read from a random replica - distributes load across shards."""
    replica = random.randint(0, NUM_REPLICAS - 1)
    return redis.get(f"{base_key}:r{replica}")

This distributes reads across different Redis shards (assuming the key suffixes hash to different slots). Write cost increases by Nx, but for read-heavy hot keys, the trade-off is worth it.

5. Over-Caching - When the Cache Becomes the Problem

Not everything should be cached. This seems obvious, but in practice, teams cache aggressively because "caching = fast" and never question whether the complexity is justified.

Signs you're over-caching:

Cache hit rate below 50%. You're spending Redis memory and network round-trips on data that's rarely reused.
More code for cache management than for business logic. Your invalidation logic is a tangled mess of event handlers and TTL heuristics.
Debugging stale data issues weekly. Every other production incident traces back to "the cache had the wrong value."
Redis memory keeps growing. Nobody knows what's in there or why.

The fix is counterintuitive: remove caches.Check your cache hit rate per key pattern. If a category of keys has a hit rate below 60-70%, you're probably better off hitting the database directly. The complexity cost of maintaining the cache exceeds the performance benefit.

Redis health check

# Check hit rate in Redis (requires keyspace notifications or tracking)
redis-cli INFO stats | grep keyspace
# keyspace_hits:12345678
# keyspace_misses:9876543
# Hit rate: 12345678 / (12345678 + 9876543) = 55.6% - might be too low

Rule of thumb: If the database query is under 10ms and the data changes frequently, skipping the cache entirely is a legitimate choice. Not everything needs to be fast at the cost of being complex.

The Trade-Offs - Honest Comparison

Strategy	Consistency	Read Speed	Write Speed	Complexity	Data Loss Risk	Best For
Cache-Aside	Eventual (TTL window)	Fast (on hit)	Unchanged	Low	None	General-purpose, read-heavy
Write-Through	Strong	Fast (always cached)	Slower (+cache write)	Medium	None	Session data, auth tokens
Write-Behind	Strong (cache is ahead)	Fast	Very fast	High	Yes - on crash	Analytics, counters, feeds
TTL-Only	Eventual (up to TTL)	Fast (on hit)	Unchanged	Very low	None	Static-ish data, configs
CDN / Edge	Eventual (purge delay)	Very fast	N/A	Low	None	Static assets, public content

There is no "best" strategy.There's only the right strategy for your specific consistency, latency, and complexity requirements.

Real-World Architecture: Layered Caching

In production, you don't use a single caching strategy. You layer them.

The Recommended Stack

plaintext

User Request
    │
    ▼
┌─────────────────────────┐
│  CDN / Edge Cache        │  ← Layer 1: Static assets, public API responses
│  (Cloudflare / CloudFront│     TTL: minutes to hours
│   Cache-Control headers) │     Hit rate target: 90%+ for static content
└──────────┬──────────────┘
           │  Cache miss
           ▼
┌─────────────────────────┐
│  Application Memory (L1) │  ← Layer 2: Hot data, config, feature flags
│  (In-process LRU cache)  │     TTL: 1–30 seconds
│                           │     Size: tens of MB per instance
└──────────┬──────────────┘
           │  Cache miss
           ▼
┌─────────────────────────┐
│  Redis / Memcached (L2)  │  ← Layer 3: Shared cache, session data, computed results
│  (Distributed cache)     │     TTL: minutes to hours
│                           │     Size: gigabytes
└──────────┬──────────────┘
           │  Cache miss
           ▼
┌─────────────────────────┐
│  Database                │  ← Source of truth. Always.
│  (PostgreSQL / MySQL)    │
└─────────────────────────┘

How the layers interact:

CDN handles static content and public responses. Configure Cache-Control headers properly. This offloads the majority of bandwidth from your infrastructure.

Cache-Control header examples

# Static assets - cache aggressively
Cache-Control: public, max-age=31536000, immutable

# Public API responses - cache briefly
Cache-Control: public, max-age=60, s-maxage=300

# Authenticated responses - never cache at CDN
Cache-Control: private, no-store

L1 (local memory) absorbs hot key traffic within each application instance. Short TTLs keep staleness manageable. The key insight: even a 1-second local cache on 20 application instances turns 100,000 Redis reads/second into 20 Redis reads/second for that key.

L2 (Redis) serves as the shared cache for all instances. Handles cache-aside and write-through strategies. This is where your business-logic caching lives.

Database remains the source of truth. Every read that misses all cache layers hits the database. Size your database to handle this miss rate, not your total read traffic.

Advanced Techniques

Cache Warming

Pre-populate the cache before it's needed. Critical after deployments, Redis restarts, or scaling events where the cache is cold.

Cache warming on deployment

def warm_cache():
    """Pre-populate cache with frequently accessed data."""
    # Top 1000 most accessed products
    popular = db.query("""
        SELECT product_id FROM access_logs
        WHERE timestamp > NOW() - INTERVAL '1 hour'
        GROUP BY product_id
        ORDER BY COUNT(*) DESC
        LIMIT 1000
    """)

    for product_id in popular:
        product = db.query_product(product_id)
        redis.setex(f"product:{product_id}", 3600, json.dumps(product))

    print(f"Warmed {len(popular)} product cache entries")

# Run on deployment or Redis recovery
# Also run periodically to keep hot data cached

Without warming, a cold cache means 100% of traffic hits the database. If your database was sized to handle 10% of reads (the rest served by cache), a cold start can trigger a cascading failure. Warm the cache before sending traffic.

Background Refresh (Proactive Re-Caching)

Instead of waiting for entries to expire and trigger a cache miss, refresh them in the background before they expire.

Background cache refresher

import asyncio

async def background_refresher(keys_to_watch, fetch_fn, ttl, refresh_at=0.75):
    """
    Refresh cache entries when they've used 75% of their TTL.
    Eliminates cache misses for known hot keys.
    """
    while True:
        for key in keys_to_watch:
            remaining_ttl = redis.ttl(key)
            threshold = ttl * (1 - refresh_at)  # Refresh when 25% TTL remains

            if 0 < remaining_ttl < threshold:
                fresh = await fetch_fn(key)
                redis.setex(key, ttl, json.dumps(fresh))

        await asyncio.sleep(1)  # Check every second

This completely eliminates cache misses for known hot keys. Zero stampede risk. Zero stale data during the refresh window. The downside is maintaining the list of keys to watch and the background worker infrastructure.

Partial Caching

Don't cache entire objects when you only need part of them. Cache fields independently at different TTLs.

Partial caching by field

# Instead of caching the entire user object...
redis.setex("user:123", 300, json.dumps(full_user_object))

# ...cache fields separately based on change frequency
redis.setex("user:123:profile", 3600, json.dumps(profile))      # Rarely changes
redis.setex("user:123:preferences", 300, json.dumps(prefs))     # Changes sometimes
redis.setex("user:123:activity", 30, json.dumps(activity))      # Changes often

The benefit:Different TTLs per field. Profile data that changes once a month gets a long TTL. Activity data that changes every minute gets a short one. You're not invalidating stable data because a volatile field changed.

The cost: More cache keys, more Redis round-trips per read (unless you use Redis pipelines or Lua scripts to batch the reads). Measure whether the improved hit rate justifies the additional complexity.

When NOT to Cache

Caching is not always the answer. Here are the cases where adding a cache makes things worse.

Real-time financial data.Stock prices, account balances, transaction statuses. The cost of stale data is too high. Query the source of truth directly. If the database is too slow, optimize the query or the database - don't paper over it with a cache.

Highly dynamic data with low read reuse.If every request is for a different piece of data (long-tail access patterns), cache hit rates will be near zero. You're adding latency (Redis round-trip for the miss) without saving anything.

Low traffic systems. If your database handles 50 requests/second comfortably and your data changes frequently, adding a cache adds complexity without meaningful performance gain. Not every system needs Redis.

Personalized responses with high cardinality.If every user sees completely different data and you have millions of users, the cache memory required becomes prohibitive. A recommendation feed unique to each of 10 million users? That's 10 million cache entries, most of which are accessed once and evicted.

Data with strong consistency requirements. If your application cannot tolerate any staleness - not even milliseconds - caching adds risk without flexibility. Design for direct database reads with read replicas instead.

The test is simple:Calculate your expected cache hit rate. If it's below 60%, question whether caching is worth the complexity. If the data has financial, legal, or security implications when stale, bypass the cache.

Monitoring Your Cache - What to Watch

A cache without monitoring is a liability. You won't know it's broken until users start seeing stale data or your database falls over.

Metrics that matter:

Metric	What it tells you	Alert threshold
Hit rate	Effectiveness of caching	Below 70% - investigate
Miss rate	Load on database from cache misses	Sudden spike - possible stampede
Eviction rate	Cache is full, throwing out entries	Above 0 consistently - add memory
Memory usage	How much of Redis capacity is used	Above 80% - scale or optimize keys
Latency (p99)	Redis performance under load	Above 5ms - investigate
Key count	Total entries in cache	Unexpected growth - possible key leak
TTL distribution	Are entries expiring as expected	Large cluster of same-TTL keys - stampede risk

Quick Redis health check

# Quick Redis health check commands
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses"
redis-cli INFO memory | grep -E "used_memory_human|maxmemory_human"
redis-cli INFO clients | grep connected_clients
redis-cli DBSIZE

Set up dashboards for these metrics.When something goes wrong with caching, it usually shows up as a gradual degradation, not a sudden failure. Hit rates drop slowly. Memory creeps up. You won't notice without graphs.

Summary: What You Should Actually Do

If you're adding caching for the first time: Start with cache-aside and Redis. It's the simplest strategy that works at scale. Set reasonable TTLs as a safety net. Delete cache entries on writes - don't update them.

If you need strong consistency: Use write-through for critical data. Accept the write latency cost. For data where staleness is unacceptable, skip the cache entirely and read from the database.

If you're hitting performance limits: Layer your caches. CDN for static content, local memory for hot keys, Redis for shared state. Each layer absorbs traffic before it reaches the next.

If you're experiencing stampedes: Implement locking or request coalescing. Add probabilistic early expiration for your hottest keys. Warm the cache on deployment.

Regardless of your approach:

Always have a fallback.If Redis dies, your application should degrade gracefully, not crash. Hit the database directly. It'll be slower but it'll work.
Monitor hit rates obsessively. A cache with a 40% hit rate is adding complexity for minimal benefit. Remove it or fix it.
Don't cache everything. The best caching strategy for some data is no cache at all.
Treat invalidation as a first-class problem. Don't bolt it on later. Design your invalidation strategy before you write your first redis.set().
Size your database for cache misses, not for zero traffic. The cache will fail. When it does, the database needs to survive the load.

Caching is powerful. It's also one of the easiest ways to introduce subtle, hard-to-debug data consistency issues into your system. Build it deliberately, monitor it constantly, and don't be afraid to remove it when it's causing more problems than it solves.

Layered caching architecture showing a request flowing left to right from the internet through a CDN, an in-process application memory cache, Redis, and finally the database — each layer absorbing traffic before it reaches the next. — A layered cache pipeline — each tier soaks up the requests it can answer, and the database only sees what nobody else could.

Caching Strategies That Work (And When They Fail)

Why "Just Add a Cache" Is Dangerous Advice

Types of Caching - Where the Data Lives

In-Memory (Application Level)

Distributed Cache (Redis / Memcached)

CDN / Edge Caching

Database Query Cache

Caching Strategies That Actually Work in Production

Strategy 1: Cache-Aside (Lazy Loading)

Strategy 2: Write-Through

Strategy 3: Write-Behind (Write-Back)

Strategy 4: TTL-Based Caching (Time-To-Live)

Where Caching Fails - The Hard Problems

1. Cache Invalidation - The Hardest Problem in Computer Science

2. Stale Data in Critical Systems

3. Cache Stampede (Thundering Herd)

4. Hot Keys - When One Key Gets All the Traffic

5. Over-Caching - When the Cache Becomes the Problem

The Trade-Offs - Honest Comparison

Real-World Architecture: Layered Caching

The Recommended Stack

Advanced Techniques

Cache Warming

Background Refresh (Proactive Re-Caching)

Partial Caching

When NOT to Cache

Monitoring Your Cache - What to Watch

Summary: What You Should Actually Do

How to Design a Rate Limiter That Actually Works at Scale

SQS vs Kafka: When to Use What in Real Systems

SNS vs Kafka: When You Actually Need Pub/Sub vs an Event Log