How I Cut API Response Time by 73% With a Redis Strategy Nobody Talks About
Cache hit rate is a vanity metric. Here is the response-level caching strategy that actually cut our p95 from 380ms to 102ms — and the 5 anti-patterns most teams miss.
Md. Rony Ahmed
· 9 min read
Everyone knows Redis caches data. Nobody tells you the caching strategy that makes your API actually fast.
I spent six months watching our cache hit rate sit at 94% while our p95 response time barely moved. The data was cached. The API was still slow. The problem was not Redis — it was how we used it.
Here is the strategy that cut our API response time from 380ms to 102ms — and the anti-patterns that were silently killing performance.
We had a typical setup:
- API receives request
- Check Redis → hit? return data
- Miss? Query Postgres, write to Redis, return data
- TTL: 5 minutes
Cache hit rate: 94%. Looked great on dashboards. But our p95 latency? 380ms. The fast cache hits were still taking 80-120ms. Something was wrong.
The hidden issue: We were caching database query results, not API responses. Every cache hit still required JSON serialization, object mapping, and response construction. The cache saved us the Postgres round-trip — but all the CPU work remained.
We flipped the approach. Instead of caching raw query results, we cache the final rendered API response — complete JSON payload, ready to serve.
Result: Cache hits dropped from 80ms to 8ms. The difference? We eliminated JSON parsing, object mapping, and response formatting on every single request.
Just invalidate the cache when data changes sounds simple until you have 47 cache keys referencing the same user across different endpoints.
We implemented a staleness budget instead of chasing perfect invalidation:
Why this works: Instead of complex invalidation chains, we accept bounded staleness. A user profile might be 30 seconds behind reality. Their analytics dashboard? Up to 10 minutes. Every context gets a tolerance budget.
Trade-off accepted: Slightly stale data for massive performance gains. We document these budgets. Users (internal teams) know the freshness guarantee per endpoint.
Here is what happened when we rolled this out:
Before (Data-Level Cache)
- p50: 85ms
- p95: 380ms
- p99: 890ms
- Cache hit rate: 94%
After (Response-Level + Stale-While-Revalidate)
- p50: 12ms
- p95: 102ms
- p99: 245ms
- Effective cache hit rate: 97% (includes stale-while-revalidate serves)
The 73% p95 improvement came from three changes working together:
1. Response-level caching (biggest impact): Eliminated per-request processing
2. Stale-while-revalidate: Background refresh prevented stampede effects
3. Connection pooling: Persistent Redis connections (not createClient per request)
1. Caching database rows instead of responses
- Saves network round-trip but keeps CPU work
- Fix: Cache the final serialized response
2. Using the same TTL everywhere
- User profile (changes often) and analytics (changes rarely) get same expiry
- Fix: Context-aware TTL budgets per endpoint
3. Cache stampede on expiry
- 50 requests hit at once when key expires, all query database
- Fix: Stale-while-revalidate with background refresh
4. Creating new Redis connections per request
- Connection overhead: 5-15ms per request
- Fix: Persistent connection pool, reuse across requests
5. Not monitoring cache efficiency
- Cache hit rate is vanity metric. Monitor response time distribution for cached vs uncached hits
- Fix: Tag metrics by cached: true/false and stale: true/false
- Frequently mutating data: If data changes every second, caching adds complexity without benefit
- Large payloads (>1MB): Redis single-threaded, large values block other operations
- Strict consistency requirements: Financial transactions, real-time bidding — accept the database cost
We stopped watching cache hit rate and started tracking:
- Cached response time (target: <15ms)
- Stale serve rate (target: <5% of total requests)
- Background refresh failure rate (target: <0.1%)
- Redis memory fragmentation (large values = fragmentation)
Redis is fast. But most implementations only use 20% of its potential. The gap between we use Redis and Redis makes our API fast is in what you cache and how you invalidate.
Cache the final response, not the raw data. Accept bounded staleness. Monitor response times, not hit rates. That is the 73% difference.
I spent six months watching our cache hit rate sit at 94% while our p95 response time barely moved. The data was cached. The API was still slow. The problem was not Redis — it was how we used it.
Here is the strategy that cut our API response time from 380ms to 102ms — and the anti-patterns that were silently killing performance.
The Problem: Cache Hit Rate Is a Vanity Metric
We had a typical setup:
- API receives request
- Check Redis → hit? return data
- Miss? Query Postgres, write to Redis, return data
- TTL: 5 minutes
Cache hit rate: 94%. Looked great on dashboards. But our p95 latency? 380ms. The fast cache hits were still taking 80-120ms. Something was wrong.
The hidden issue: We were caching database query results, not API responses. Every cache hit still required JSON serialization, object mapping, and response construction. The cache saved us the Postgres round-trip — but all the CPU work remained.
The Real Fix: Response-Level Caching with Stale-While-Revalidate
We flipped the approach. Instead of caching raw query results, we cache the final rendered API response — complete JSON payload, ready to serve.
Before (Data-Level Cache)
// Cache stores raw SQL results
const userData = await redis.get(`user:${id}`);
if (userData) {
// Still need to: parse, filter fields, serialize JSON
return formatResponse(JSON.parse(userData));
}
const result = await db.query('SELECT * FROM users WHERE id = $1', [id]);
await redis.setex(`user:${id}`, 300, JSON.stringify(result));
return formatResponse(result);
After (Response-Level Cache)
// Cache stores FINAL API response
const cached = await redis.get(`api:user:${id}`);
if (cached) {
// Direct response — zero processing
res.setHeader('Content-Type', 'application/json');
return res.send(cached); // Already stringified JSON
}
const result = await db.query('SELECT * FROM users WHERE id = $1', [id]);
const response = JSON.stringify(formatResponse(result));
// Write-through: cache the final response
await redis.setex(`api:user:${id}`, 300, response);
res.send(response);
Result: Cache hits dropped from 80ms to 8ms. The difference? We eliminated JSON parsing, object mapping, and response formatting on every single request.
The Staleness Budget: How We Handle Cache Invalidation
Just invalidate the cache when data changes sounds simple until you have 47 cache keys referencing the same user across different endpoints.
We implemented a staleness budget instead of chasing perfect invalidation:
const STALENESS_BUDGET = {
'user:profile': 30, // 30 seconds max staleness
'user:dashboard': 120, // 2 minutes — less critical
'user:analytics': 600 // 10 minutes — historical data
};
// Write with context-aware TTL
async function cacheResponse(key, response, context) {
const ttl = STALENESS_BUDGET[context] || 60;
await redis.setex(key, ttl, response);
}
Why this works: Instead of complex invalidation chains, we accept bounded staleness. A user profile might be 30 seconds behind reality. Their analytics dashboard? Up to 10 minutes. Every context gets a tolerance budget.
Trade-off accepted: Slightly stale data for massive performance gains. We document these budgets. Users (internal teams) know the freshness guarantee per endpoint.
The 73% Improvement: Real Production Numbers
Here is what happened when we rolled this out:
Before (Data-Level Cache)
- p50: 85ms
- p95: 380ms
- p99: 890ms
- Cache hit rate: 94%
After (Response-Level + Stale-While-Revalidate)
- p50: 12ms
- p95: 102ms
- p99: 245ms
- Effective cache hit rate: 97% (includes stale-while-revalidate serves)
The 73% p95 improvement came from three changes working together:
1. Response-level caching (biggest impact): Eliminated per-request processing
2. Stale-while-revalidate: Background refresh prevented stampede effects
3. Connection pooling: Persistent Redis connections (not createClient per request)
Implementation: The Complete Pattern
const redis = new Redis({
host: process.env.REDIS_HOST,
port: 6379,
maxRetriesPerRequest: 3,
// CRITICAL: Connection pooling, not new connections per request
lazyConnect: false
});
async function getCachedResponse(cacheKey, fetchFn, context = 'default') {
// 1. Try cache
const cached = await redis.get(cacheKey);
if (cached) {
// Check staleness budget
const ttl = await redis.ttl(cacheKey);
const budget = STALENESS_BUDGET[context] || 60;
// If still within budget, serve immediately
if (ttl > (budget * 0.1)) { // Within 90% of budget
return { data: cached, cached: true };
}
// Near expiry: serve stale, trigger background refresh
// Do not await — let it happen in background
refreshCache(cacheKey, fetchFn, context);
return { data: cached, cached: true, stale: true };
}
// 2. Cache miss: fetch and cache
const fresh = await fetchFn();
const response = JSON.stringify(fresh);
await cacheResponse(cacheKey, response, context);
return { data: response, cached: false };
}
// Background refresh — no await, fire-and-forget
function refreshCache(cacheKey, fetchFn, context) {
fetchFn().then(data => {
const response = JSON.stringify(data);
return cacheResponse(cacheKey, response, context);
}).catch(err => {
console.error('Background cache refresh failed:', err);
});
}
5 Caching Anti-Patterns That Cost You Performance
1. Caching database rows instead of responses
- Saves network round-trip but keeps CPU work
- Fix: Cache the final serialized response
2. Using the same TTL everywhere
- User profile (changes often) and analytics (changes rarely) get same expiry
- Fix: Context-aware TTL budgets per endpoint
3. Cache stampede on expiry
- 50 requests hit at once when key expires, all query database
- Fix: Stale-while-revalidate with background refresh
4. Creating new Redis connections per request
- Connection overhead: 5-15ms per request
- Fix: Persistent connection pool, reuse across requests
5. Not monitoring cache efficiency
- Cache hit rate is vanity metric. Monitor response time distribution for cached vs uncached hits
- Fix: Tag metrics by cached: true/false and stale: true/false
When NOT to Use This Pattern
- Frequently mutating data: If data changes every second, caching adds complexity without benefit
- Large payloads (>1MB): Redis single-threaded, large values block other operations
- Strict consistency requirements: Financial transactions, real-time bidding — accept the database cost
Monitoring What Actually Matters
We stopped watching cache hit rate and started tracking:
- Cached response time (target: <15ms)
- Stale serve rate (target: <5% of total requests)
- Background refresh failure rate (target: <0.1%)
- Redis memory fragmentation (large values = fragmentation)
// Prometheus metrics example
const cacheMetrics = {
hitDuration: new Histogram('cache_hit_seconds', 'Response time for cache hits'),
staleServes: new Counter('cache_stale_serves_total', 'Served stale while refreshing'),
refreshFailures: new Counter('cache_refresh_failures_total', 'Background refresh failures')
};
The Bottom Line
Redis is fast. But most implementations only use 20% of its potential. The gap between we use Redis and Redis makes our API fast is in what you cache and how you invalidate.
Cache the final response, not the raw data. Accept bounded staleness. Monitor response times, not hit rates. That is the 73% difference.