Caching improves performance - until it starts serving stale data or becomes your biggest bottleneck.
Every backend engineer has the same story. System is slow. You add Redis in front of the database. Response times drop from 200ms to 5ms. You celebrate. Then three weeks later, a customer sees a price that was updated an hour ago. Or inventory shows "in stock" for something that sold out. Or two users see different data for the same resource depending on which server handles the request.
Caching is not the problem. Poorly designed caching is. And the gap between "add Redis" and "build a caching layer that works in production" is enormous.
This article is not an introduction to caching. It's about the real decisions you'll face: which strategy to use, when each one breaks, how to handle invalidation without losing your mind, and the failure modes that show up only at scale.
Why "Just Add a Cache" Is Dangerous Advice
The caching pitch is seductive. Database query takes 150ms? Cache the result, serve it in 2ms. Problem solved.
Here's what actually happens next:
Stale data. The database gets updated, but the cache still holds the old value. Users see outdated information. For a blog post, nobody cares. For inventory counts, pricing, or account balances - it's a production incident.
Cache invalidation complexity. You now need to figure out when to update the cache. Every write path needs to know about the cache. Miss one, and you have inconsistency. Your simple caching layer becomes a distributed consistency problem.
Inconsistent reads. With multiple application instances, some might have local caches that are stale while others have fresh data. User A sees one thing, user B sees another. This is brutal to debug because it's intermittent and depends on which server handles the request.
The hidden contract. When you add a cache, you've made an implicit decision: you're trading consistency for speed. Every caching strategy is a different point on that trade-off curve. If you don't choose deliberately, production will choose for you - usually at the worst possible time.
Types of Caching - Where the Data Lives
Before choosing a strategy, understand the layers available to you. Each has different latency characteristics, capacity, and invalidation complexity.
In-Memory (Application Level)
Data lives in the application process's memory. A dictionary, a hash map, a local LRU cache.
Latency: ~0.001ms (nanoseconds - it's a memory lookup)
Capacity: Limited by application memory. Usually tens of megabytes at most.
Invalidation: Only the local instance knows about it. Other instances have no idea.
# Simple in-memory cache with TTL
from functools import lru_cache
from cachetools import TTLCache
# Option 1: Python's built-in (no TTL, no size limit awareness)
@lru_cache(maxsize=1024)
def get_user(user_id):
return db.query(f"SELECT * FROM users WHERE id = {user_id}")
# Option 2: TTL-based (better for production)
cache = TTLCache(maxsize=1024, ttl=300) # 5-minute TTL
def get_user(user_id):
if user_id in cache:
return cache[user_id]
user = db.query_user(user_id)
cache[user_id] = user
return userWhen it works: High-read, low-change data. Configuration values, feature flags, reference data that changes once a day.
When it breaks: Multiple application instances. Each maintains its own cache. Update one, and the others serve stale data until their TTL expires. With 10 instances and a 5-minute TTL, you can have up to 5 minutes of inconsistency across servers.
Distributed Cache (Redis / Memcached)
Data lives in a dedicated cache server shared by all application instances.
Latency: 0.5–2ms (network round-trip)
Capacity: Gigabytes to terabytes. Limited by the cache cluster's memory.
Invalidation: Centralized. Update once, all instances see the change immediately.
Redis vs. Memcached - the actual difference:
| Feature | Redis | Memcached |
|---|---|---|
| Data structures | Strings, hashes, lists, sets, sorted sets | Strings only |
| Persistence | Optional RDB/AOF | None |
| Replication | Built-in primary/replica | None |
| Memory efficiency | Less efficient (overhead per key) | More efficient for simple key-value |
| Eviction | Multiple policies (LRU, LFU, TTL, etc.) | LRU only |
My take: Use Redis unless you're caching massive volumes of simple key-value pairs and need maximum memory efficiency. Redis's data structures and built-in features save enough development time to justify the overhead.
CDN / Edge Caching
Data lives on servers geographically close to the user. Cloudflare, AWS CloudFront, Fastly, etc.
Latency: 1–20ms (depending on proximity to edge node)
Capacity: Effectively unlimited (distributed across global PoPs)
Invalidation: Slow. Purging a CDN cache can take seconds to minutes. Some providers charge per purge request.
When it works: Static assets (images, CSS, JS), public API responses that are the same for all users, marketing pages.
When it breaks: Personalized content, frequently updated data, anything that varies per user. Setting wrong Cache-Control headers on authenticated API responses is a security incident - one user's data served to another.
Database Query Cache
The database itself caches query results. MySQL had a built-in query cache (removed in 8.0 because it caused more problems than it solved). PostgreSQL doesn't cache query results but caches execution plans and buffer pages.
Latency: Varies. Buffer cache hits are fast. Query cache hits skip parsing and execution entirely.
Capacity: Managed by the database engine.
Invalidation: Automatic - any write to a cached table invalidates relevant entries.
When it works: Read-heavy workloads with large, complex queries that return the same results frequently.
When it breaks: Write-heavy workloads. MySQL's query cache invalidated the entire table's cached results on any write to that table. One insert killed every cached query for that table. This is why MySQL removed it - under write-heavy loads, the cache was a net negative.
Bottom line: Don't rely on database-level caching as your primary strategy. Treat it as a bonus, not a plan.
Caching Strategies That Actually Work in Production
This is where the real decisions happen. Each strategy defines how data flows between your application, cache, and database.
Strategy 1: Cache-Aside (Lazy Loading)
The most widely used pattern. Your application manages the cache explicitly.
Flow:
Read path:
1. Application checks cache
2. Cache hit → return cached data
3. Cache miss → query database → store result in cache → return data
Write path:
1. Application writes to database
2. Application invalidates (deletes) the cache entrydef get_product(product_id):
# Step 1: Check cache
cached = redis.get(f"product:{product_id}")
if cached:
return json.loads(cached)
# Step 2: Cache miss - hit database
product = db.query("SELECT * FROM products WHERE id = %s", product_id)
# Step 3: Populate cache for next time
redis.setex(f"product:{product_id}", 3600, json.dumps(product)) # 1hr TTL
return product
def update_product(product_id, data):
# Step 1: Update database (source of truth)
db.execute("UPDATE products SET ... WHERE id = %s", product_id)
# Step 2: Invalidate cache (NOT update - delete)
redis.delete(f"product:{product_id}")Why delete instead of update the cache? Because delete is idempotent and simpler. If you try to update the cache with the new value, you introduce a race condition: two concurrent updates might write to the database in order A→B but update the cache in order B→A. Now the cache holds stale data permanently. Deleting the cache means the next read will fetch the latest data from the database.
When it works: Read-heavy workloads where a cache miss (hitting the database) is acceptable. This covers the majority of web applications.
When it breaks:
- Cold cache problem. After a deployment or Redis restart, every request is a cache miss. Your database gets hammered with 100% of read traffic simultaneously. This can take down a database that was sized to handle 10% of reads (the remaining 90% normally served by cache).
- Stale data window. Between a database write and the cache delete, there's a brief window where the cache holds old data. In practice, this window is milliseconds and rarely matters. But for financial data or inventory counts, "rarely" isn't good enough.
- Write-heavy data. If data changes frequently, the cache is constantly being invalidated. Your cache hit rate drops, and you're paying the overhead of Redis calls without the benefit.
Strategy 2: Write-Through
Every write goes to both the database and the cache simultaneously. The cache is always consistent with the database.
Flow:
Write path:
1. Application writes to database
2. Application immediately writes the same data to cache
3. Both succeed → write is complete
Read path:
1. Always read from cache (it's guaranteed to be current)
2. Cache miss (only after eviction) → query database → populate cachedef update_product(product_id, data):
# Write to database
db.execute("UPDATE products SET ... WHERE id = %s", product_id)
# Write to cache immediately
product = db.query("SELECT * FROM products WHERE id = %s", product_id)
redis.setex(f"product:{product_id}", 3600, json.dumps(product))
def get_product(product_id):
cached = redis.get(f"product:{product_id}")
if cached:
return json.loads(cached)
# Cache miss (rare - only after TTL expiry or eviction)
product = db.query("SELECT * FROM products WHERE id = %s", product_id)
redis.setex(f"product:{product_id}", 3600, json.dumps(product))
return productWhen it works: Systems where consistency between cache and database is critical. User sessions, authentication tokens, shopping carts - data that changes frequently and must be read accurately.
When it breaks:
- Slower writes. Every write operation now includes a cache write. That's an extra 1–2ms per write. At 10,000 writes/second, that adds up.
- Wasted cache space. You're caching every piece of written data, even if nobody reads it. If you have a write-heavy table where most rows are written once and rarely read, you're filling your cache with data nobody needs.
- Failure coupling. If Redis is down, do you fail the write? If you skip the cache write, you're back to cache-aside with inconsistency. If you fail the entire operation, your cache dependency has become a write-path dependency - Redis downtime means write outages.
Strategy 3: Write-Behind (Write-Back)
The most aggressive caching strategy. Writes go to the cache first, and the database is updated asynchronously in the background.
Flow:
Write path:
1. Application writes to cache
2. Return success to the client immediately
3. Background process flushes cache writes to database (batched, async)
Read path:
1. Always read from cache (it has the latest data)# Simplified write-behind with a background worker
import threading
import queue
write_queue = queue.Queue()
def update_product(product_id, data):
# Write to cache immediately - this is the "source of truth" temporarily
redis.setex(f"product:{product_id}", 7200, json.dumps(data))
# Queue the database write for async processing
write_queue.put(("product", product_id, data))
# Return to client - no database latency
return {"status": "ok"}
def db_writer_worker():
"""Background thread that flushes writes to the database in batches."""
batch = []
while True:
try:
item = write_queue.get(timeout=1.0)
batch.append(item)
# Flush when batch is large enough or queue is empty
if len(batch) >= 50 or write_queue.empty():
flush_batch_to_db(batch)
batch = []
except queue.Empty:
if batch:
flush_batch_to_db(batch)
batch = []When it works: High-write-throughput systems where write latency matters more than durability. Gaming leaderboards, activity feeds, analytics counters, real-time dashboards. Anything where losing the last few seconds of data during a crash is acceptable.
When it breaks - and this is important:
- Data loss on failure. If Redis crashes before the background process flushes to the database, those writes are gone. Not "delayed" - gone. For any system where data loss is unacceptable (payments, orders, financial transactions), write-behind is the wrong strategy. Full stop.
- Ordering complexity. Batch writes can arrive at the database out of order. If user A updates a record, then user B updates the same record, the batch processor needs to apply them in the correct order. This gets complicated fast.
- Debugging nightmares. The database is always behind the cache. If you query the database directly (for analytics, admin tools, debugging), you're seeing stale data. This confuses everyone.
My recommendation: Use write-behind only when you've explicitly accepted the data loss risk and the performance benefit justifies it. For most applications, write-through or cache-aside is the safer choice.
Strategy 4: TTL-Based Caching (Time-To-Live)
Not a standalone strategy - more of a parameter applied to other strategies. Every cached value gets an expiration time. After the TTL expires, the entry is evicted, and the next read fetches fresh data from the database.
# TTL examples by data type
TTL_CONFIG = {
"user_profile": 300, # 5 minutes - changes occasionally
"product_listing": 60, # 1 minute - prices can change
"feature_flags": 30, # 30 seconds - needs to propagate quickly
"static_config": 3600, # 1 hour - rarely changes
"search_results": 120, # 2 minutes - acceptable staleness
"session_data": 1800, # 30 minutes - security consideration
}
def get_with_ttl(key, fetch_fn, ttl):
cached = redis.get(key)
if cached:
return json.loads(cached)
fresh = fetch_fn()
redis.setex(key, ttl, json.dumps(fresh))
return freshThe appeal of TTL: It's simple. No complex invalidation logic. Set it and forget it. The cache self-heals over time as entries expire and get repopulated with fresh data.
Here's the problem: TTL is a guess. You're saying "this data is probably still valid for N seconds." That "probably" is doing a lot of heavy lifting.
Too short: High cache miss rate. You're hitting the database frequently, negating the benefit of caching. A 5-second TTL on a value that changes once a day means 17,280 unnecessary cache misses per day per key.
Too long: Stale data. A 1-hour TTL on product pricing means customers could see outdated prices for up to an hour after a change.
The honest truth about TTL: It works well as a safety net on top of active invalidation. Set a TTL as a backstop, but don't rely on it as your primary invalidation mechanism for data that matters.
Where Caching Fails - The Hard Problems
This is the section that matters most. Every caching tutorial shows you how to set a value in Redis. Almost none talk about what happens when things go wrong.
1. Cache Invalidation - The Hardest Problem in Computer Science
Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. He wasn't joking about the first one.
The core question: When data changes in the database, how do you ensure the cache reflects that change?
Approach A: Invalidation on write (event-driven)
Every write operation explicitly deletes or updates the corresponding cache entries.
def update_user_email(user_id, new_email):
db.execute("UPDATE users SET email = %s WHERE id = %s", new_email, user_id)
# Invalidate all cache entries that contain this user's data
redis.delete(f"user:{user_id}")
redis.delete(f"user_profile:{user_id}")
redis.delete(f"user_settings:{user_id}")
# What about cached API responses that include this user?
# What about search results that show this user's email?
# What about the team members list that includes this user?
# ...this is where it gets ugly.The problem: You need to know every cache key that's affected by every database write. In a complex system, a single row update might affect dozens of cached views. Miss one, and you have stale data. Add a new feature that caches a new view? You need to update every write path that could affect it. This coupling between write paths and cache keys is fragile and scales poorly.
Approach B: TTL-based (passive expiration)
Don't invalidate explicitly. Let entries expire naturally.
The problem: Already covered. You're accepting a staleness window equal to the TTL.
Approach C: Event-driven invalidation via change data capture (CDC)
The database emits change events (via binlog, WAL, or a CDC tool like Debezium). A consumer listens to these events and invalidates the relevant cache entries.
Database write → Binlog event → Debezium → Kafka → Cache invalidation consumer → Redis DELETEThis is the most robust approach for complex systems. The write path doesn't need to know about cache keys. The invalidation consumer owns the mapping between database changes and cache entries. Adding a new cache? Update the consumer. The write path is untouched.
The trade-off: Added infrastructure complexity. You're now running Kafka (or equivalent) and a consumer service just for cache invalidation. For small systems, this is overkill. For large systems with dozens of cache keys affected by a single write, it's worth it.
2. Stale Data in Critical Systems
Not all stale data is equal. A blog post showing the wrong author bio for 30 seconds? Nobody notices. An e-commerce site showing "in stock" for a sold-out item? That's a customer support nightmare and potentially a legal issue.
Systems where stale cache data is dangerous:
- Inventory counts. Overselling because the cache showed available stock.
- Pricing. Charging the wrong amount. This can violate consumer protection laws in some jurisdictions.
- Account balances. Showing incorrect available funds.
- Permissions and access control. A revoked user retaining access because the permissions cache hasn't updated.
- Rate limiting (covered in a previous article). Stale counters letting users exceed their limits.
The rule: For any data where staleness has financial, legal, or security implications, either bypass the cache entirely or use write-through with synchronous invalidation. No TTL-only strategies. No "eventual consistency is fine."
# Critical data - bypass cache, read from database directly
def get_account_balance(account_id):
# NO CACHE. Source of truth only.
return db.query("SELECT balance FROM accounts WHERE id = %s", account_id)
# Non-critical data - cache is fine
def get_account_display_name(account_id):
return get_with_ttl(f"account_name:{account_id}",
lambda: db.query_display_name(account_id),
ttl=300)Don't be clever about caching data that can't be stale. The 2ms latency win isn't worth the production incident.
3. Cache Stampede (Thundering Herd)
One of the nastiest failure modes in caching. A popular cache entry expires, and hundreds of concurrent requests all see the cache miss simultaneously. All of them query the database at the same time.
The timeline:
T=0: Cache entry for "popular_product:123" expires
T=0.001: Request A → cache miss → queries database
T=0.002: Request B → cache miss → queries database
T=0.003: Request C → cache miss → queries database
...
T=0.010: 500 requests all querying the database for the same row
T=0.150: Database response time spikes from 5ms to 500ms
T=0.200: Downstream services start timing outAt 50,000 requests/second to a popular endpoint, a single expired cache key can generate thousands of simultaneous database queries. If that query is expensive (joins, aggregations), this can take down your database.
Solution 1: Locking (Mutex)
Only the first request that encounters a cache miss queries the database. All other requests wait for the first one to populate the cache.
def get_with_lock(key, fetch_fn, ttl):
# Try cache first
cached = redis.get(key)
if cached:
return json.loads(cached)
# Try to acquire lock
lock_key = f"lock:{key}"
acquired = redis.set(lock_key, "1", nx=True, ex=5) # 5-second lock TTL
if acquired:
try:
# This request fetches from DB and populates cache
fresh = fetch_fn()
redis.setex(key, ttl, json.dumps(fresh))
return fresh
finally:
redis.delete(lock_key)
else:
# Another request is fetching. Wait and retry from cache.
time.sleep(0.05) # 50ms backoff
cached = redis.get(key)
if cached:
return json.loads(cached)
# If still no cache, fall through to DB (prevents deadlock)
return fetch_fn()Solution 2: Request Coalescing (Single Flight)
Similar to locking, but at the application level. Group identical in-flight requests and execute the database query only once. All waiting requests get the same result.
# Python implementation using asyncio
import asyncio
from collections import defaultdict
class SingleFlight:
def __init__(self):
self._in_flight = {} # key → Future
async def do(self, key, fetch_fn):
if key in self._in_flight:
# Another request is already fetching this key. Wait for it.
return await self._in_flight[key]
# This is the first request. Create a future and fetch.
future = asyncio.get_event_loop().create_future()
self._in_flight[key] = future
try:
result = await fetch_fn()
future.set_result(result)
return result
except Exception as e:
future.set_exception(e)
raise
finally:
del self._in_flight[key]
# Usage
flight = SingleFlight()
async def get_product(product_id):
cached = await redis.get(f"product:{product_id}")
if cached:
return json.loads(cached)
# Only one request will actually hit the database
product = await flight.do(
f"product:{product_id}",
lambda: db.query_product(product_id)
)
await redis.setex(f"product:{product_id}", 3600, json.dumps(product))
return productSolution 3: Probabilistic Early Expiration
Refresh the cache before it expires. Each request that reads a cache value has a small, increasing probability of triggering a background refresh as the TTL approaches expiration.
import random
import time
def get_with_early_refresh(key, fetch_fn, ttl, beta=1.0):
"""
XFetch algorithm - probabilistic early refresh.
As TTL approaches, probability of refresh increases.
"""
cached = redis.get(key)
if cached:
data = json.loads(cached)
remaining_ttl = redis.ttl(key)
# Calculate refresh probability
# Higher beta = more aggressive early refresh
# As remaining_ttl approaches 0, probability approaches 1
if remaining_ttl > 0:
probability = max(0, 1 - (remaining_ttl / ttl)) * beta
if random.random() < probability:
# Trigger background refresh
refresh_in_background(key, fetch_fn, ttl)
return data
# True cache miss
fresh = fetch_fn()
redis.setex(key, ttl, json.dumps(fresh))
return freshThis eliminates the stampede entirely because the cache is refreshed before it expires. No mass cache miss, no thundering herd.
4. Hot Keys - When One Key Gets All the Traffic
In any system, some data is accessed far more than other data. The homepage product. A viral tweet. A popular user's profile. These "hot keys" concentrate traffic onto a single Redis key.
Why this matters with Redis Cluster: Redis Cluster shards data by key hash. A hot key means one shard handles disproportionate traffic while others sit idle. One shard's CPU spikes. Latency increases for all keys on that shard - not just the hot one.
Real numbers: If you have 16 Redis shards and one key gets 50% of your traffic, that shard is handling 8x the expected load. Your "horizontally scaled" cache has a single point of contention.
Mitigation strategies:
Local caching with short TTL. Cache hot keys in application memory. Even a 1-second local TTL can absorb thousands of requests per instance.
from cachetools import TTLCache
# Local cache - 1 second TTL, absorbs per-instance traffic
local_cache = TTLCache(maxsize=100, ttl=1)
def get_hot_data(key):
if key in local_cache:
return local_cache[key]
# Fall through to Redis
cached = redis.get(key)
if cached:
data = json.loads(cached)
local_cache[key] = data
return data
data = fetch_from_db(key)
redis.setex(key, 60, json.dumps(data))
local_cache[key] = data
return dataKey replication / fan-out. Store the same value under multiple keys with random suffixes. Distribute reads across the copies.
import random
NUM_REPLICAS = 5
def set_hot_key(base_key, value, ttl):
"""Write to all replicas."""
for i in range(NUM_REPLICAS):
redis.setex(f"{base_key}:r{i}", ttl, value)
def get_hot_key(base_key):
"""Read from a random replica - distributes load across shards."""
replica = random.randint(0, NUM_REPLICAS - 1)
return redis.get(f"{base_key}:r{replica}")This distributes reads across different Redis shards (assuming the key suffixes hash to different slots). Write cost increases by Nx, but for read-heavy hot keys, the trade-off is worth it.
5. Over-Caching - When the Cache Becomes the Problem
Not everything should be cached. This seems obvious, but in practice, teams cache aggressively because "caching = fast" and never question whether the complexity is justified.
Signs you're over-caching:
- Cache hit rate below 50%. You're spending Redis memory and network round-trips on data that's rarely reused.
- More code for cache management than for business logic. Your invalidation logic is a tangled mess of event handlers and TTL heuristics.
- Debugging stale data issues weekly. Every other production incident traces back to "the cache had the wrong value."
- Redis memory keeps growing. Nobody knows what's in there or why.
The fix is counterintuitive: remove caches. Check your cache hit rate per key pattern. If a category of keys has a hit rate below 60-70%, you're probably better off hitting the database directly. The complexity cost of maintaining the cache exceeds the performance benefit.
# Check hit rate in Redis (requires keyspace notifications or tracking)
redis-cli INFO stats | grep keyspace
# keyspace_hits:12345678
# keyspace_misses:9876543
# Hit rate: 12345678 / (12345678 + 9876543) = 55.6% - might be too lowRule of thumb: If the database query is under 10ms and the data changes frequently, skipping the cache entirely is a legitimate choice. Not everything needs to be fast at the cost of being complex.
The Trade-Offs - Honest Comparison
| Strategy | Consistency | Read Speed | Write Speed | Complexity | Data Loss Risk | Best For |
|---|---|---|---|---|---|---|
| Cache-Aside | Eventual (TTL window) | Fast (on hit) | Unchanged | Low | None | General-purpose, read-heavy |
| Write-Through | Strong | Fast (always cached) | Slower (+cache write) | Medium | None | Session data, auth tokens |
| Write-Behind | Strong (cache is ahead) | Fast | Very fast | High | Yes - on crash | Analytics, counters, feeds |
| TTL-Only | Eventual (up to TTL) | Fast (on hit) | Unchanged | Very low | None | Static-ish data, configs |
| CDN / Edge | Eventual (purge delay) | Very fast | N/A | Low | None | Static assets, public content |
There is no "best" strategy. There's only the right strategy for your specific consistency, latency, and complexity requirements.
Real-World Architecture: Layered Caching
In production, you don't use a single caching strategy. You layer them.
The Recommended Stack
User Request
│
▼
┌─────────────────────────┐
│ CDN / Edge Cache │ ← Layer 1: Static assets, public API responses
│ (Cloudflare / CloudFront│ TTL: minutes to hours
│ Cache-Control headers) │ Hit rate target: 90%+ for static content
└──────────┬──────────────┘
│ Cache miss
▼
┌─────────────────────────┐
│ Application Memory (L1) │ ← Layer 2: Hot data, config, feature flags
│ (In-process LRU cache) │ TTL: 1–30 seconds
│ │ Size: tens of MB per instance
└──────────┬──────────────┘
│ Cache miss
▼
┌─────────────────────────┐
│ Redis / Memcached (L2) │ ← Layer 3: Shared cache, session data, computed results
│ (Distributed cache) │ TTL: minutes to hours
│ │ Size: gigabytes
└──────────┬──────────────┘
│ Cache miss
▼
┌─────────────────────────┐
│ Database │ ← Source of truth. Always.
│ (PostgreSQL / MySQL) │
└─────────────────────────┘How the layers interact:
CDN handles static content and public responses. Configure Cache-Control headers properly. This offloads the majority of bandwidth from your infrastructure.
# Static assets - cache aggressively
Cache-Control: public, max-age=31536000, immutable
# Public API responses - cache briefly
Cache-Control: public, max-age=60, s-maxage=300
# Authenticated responses - never cache at CDN
Cache-Control: private, no-storeL1 (local memory) absorbs hot key traffic within each application instance. Short TTLs keep staleness manageable. The key insight: even a 1-second local cache on 20 application instances turns 100,000 Redis reads/second into 20 Redis reads/second for that key.
L2 (Redis) serves as the shared cache for all instances. Handles cache-aside and write-through strategies. This is where your business-logic caching lives.
Database remains the source of truth. Every read that misses all cache layers hits the database. Size your database to handle this miss rate, not your total read traffic.
Advanced Techniques
Cache Warming
Pre-populate the cache before it's needed. Critical after deployments, Redis restarts, or scaling events where the cache is cold.
def warm_cache():
"""Pre-populate cache with frequently accessed data."""
# Top 1000 most accessed products
popular = db.query("""
SELECT product_id FROM access_logs
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY product_id
ORDER BY COUNT(*) DESC
LIMIT 1000
""")
for product_id in popular:
product = db.query_product(product_id)
redis.setex(f"product:{product_id}", 3600, json.dumps(product))
print(f"Warmed {len(popular)} product cache entries")
# Run on deployment or Redis recovery
# Also run periodically to keep hot data cachedWithout warming, a cold cache means 100% of traffic hits the database. If your database was sized to handle 10% of reads (the rest served by cache), a cold start can trigger a cascading failure. Warm the cache before sending traffic.
Background Refresh (Proactive Re-Caching)
Instead of waiting for entries to expire and trigger a cache miss, refresh them in the background before they expire.
import asyncio
async def background_refresher(keys_to_watch, fetch_fn, ttl, refresh_at=0.75):
"""
Refresh cache entries when they've used 75% of their TTL.
Eliminates cache misses for known hot keys.
"""
while True:
for key in keys_to_watch:
remaining_ttl = redis.ttl(key)
threshold = ttl * (1 - refresh_at) # Refresh when 25% TTL remains
if 0 < remaining_ttl < threshold:
fresh = await fetch_fn(key)
redis.setex(key, ttl, json.dumps(fresh))
await asyncio.sleep(1) # Check every secondThis completely eliminates cache misses for known hot keys. Zero stampede risk. Zero stale data during the refresh window. The downside is maintaining the list of keys to watch and the background worker infrastructure.
Partial Caching
Don't cache entire objects when you only need part of them. Cache fields independently at different TTLs.
# Instead of caching the entire user object...
redis.setex("user:123", 300, json.dumps(full_user_object))
# ...cache fields separately based on change frequency
redis.setex("user:123:profile", 3600, json.dumps(profile)) # Rarely changes
redis.setex("user:123:preferences", 300, json.dumps(prefs)) # Changes sometimes
redis.setex("user:123:activity", 30, json.dumps(activity)) # Changes oftenThe benefit: Different TTLs per field. Profile data that changes once a month gets a long TTL. Activity data that changes every minute gets a short one. You're not invalidating stable data because a volatile field changed.
The cost: More cache keys, more Redis round-trips per read (unless you use Redis pipelines or Lua scripts to batch the reads). Measure whether the improved hit rate justifies the additional complexity.
When NOT to Cache
Caching is not always the answer. Here are the cases where adding a cache makes things worse.
Real-time financial data. Stock prices, account balances, transaction statuses. The cost of stale data is too high. Query the source of truth directly. If the database is too slow, optimize the query or the database - don't paper over it with a cache.
Highly dynamic data with low read reuse. If every request is for a different piece of data (long-tail access patterns), cache hit rates will be near zero. You're adding latency (Redis round-trip for the miss) without saving anything.
Low traffic systems. If your database handles 50 requests/second comfortably and your data changes frequently, adding a cache adds complexity without meaningful performance gain. Not every system needs Redis.
Personalized responses with high cardinality. If every user sees completely different data and you have millions of users, the cache memory required becomes prohibitive. A recommendation feed unique to each of 10 million users? That's 10 million cache entries, most of which are accessed once and evicted.
Data with strong consistency requirements. If your application cannot tolerate any staleness - not even milliseconds - caching adds risk without flexibility. Design for direct database reads with read replicas instead.
The test is simple: Calculate your expected cache hit rate. If it's below 60%, question whether caching is worth the complexity. If the data has financial, legal, or security implications when stale, bypass the cache.
Monitoring Your Cache - What to Watch
A cache without monitoring is a liability. You won't know it's broken until users start seeing stale data or your database falls over.
Metrics that matter:
| Metric | What it tells you | Alert threshold |
|---|---|---|
| Hit rate | Effectiveness of caching | Below 70% - investigate |
| Miss rate | Load on database from cache misses | Sudden spike - possible stampede |
| Eviction rate | Cache is full, throwing out entries | Above 0 consistently - add memory |
| Memory usage | How much of Redis capacity is used | Above 80% - scale or optimize keys |
| Latency (p99) | Redis performance under load | Above 5ms - investigate |
| Key count | Total entries in cache | Unexpected growth - possible key leak |
| TTL distribution | Are entries expiring as expected | Large cluster of same-TTL keys - stampede risk |
# Quick Redis health check commands
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses"
redis-cli INFO memory | grep -E "used_memory_human|maxmemory_human"
redis-cli INFO clients | grep connected_clients
redis-cli DBSIZESet up dashboards for these metrics. When something goes wrong with caching, it usually shows up as a gradual degradation, not a sudden failure. Hit rates drop slowly. Memory creeps up. You won't notice without graphs.
Summary: What You Should Actually Do
If you're adding caching for the first time: Start with cache-aside and Redis. It's the simplest strategy that works at scale. Set reasonable TTLs as a safety net. Delete cache entries on writes - don't update them.
If you need strong consistency: Use write-through for critical data. Accept the write latency cost. For data where staleness is unacceptable, skip the cache entirely and read from the database.
If you're hitting performance limits: Layer your caches. CDN for static content, local memory for hot keys, Redis for shared state. Each layer absorbs traffic before it reaches the next.
If you're experiencing stampedes: Implement locking or request coalescing. Add probabilistic early expiration for your hottest keys. Warm the cache on deployment.
Regardless of your approach:
- Always have a fallback. If Redis dies, your application should degrade gracefully, not crash. Hit the database directly. It'll be slower but it'll work.
- Monitor hit rates obsessively. A cache with a 40% hit rate is adding complexity for minimal benefit. Remove it or fix it.
- Don't cache everything. The best caching strategy for some data is no cache at all.
- Treat invalidation as a first-class problem. Don't bolt it on later. Design your invalidation strategy before you write your first
redis.set(). - Size your database for cache misses, not for zero traffic. The cache will fail. When it does, the database needs to survive the load.
Caching is powerful. It's also one of the easiest ways to introduce subtle, hard-to-debug data consistency issues into your system. Build it deliberately, monitor it constantly, and don't be afraid to remove it when it's causing more problems than it solves.



