Redis Changed How I Think About Databases
Redis is not just a cache. Sorted sets, streams, pub/sub, and HyperLogLog changed how I architect everything.
Enjoyed this article?
Get new posts delivered to your inbox. No spam, unsubscribe anytime.
I thought Redis was a cache. You set a key, you get a key, it lives in memory, it's fast. That was my entire mental model for about three years. Then I needed a real-time leaderboard and wrote this:
import redis
r = redis.Redis(host='localhost', port=6379)
r.zadd('leaderboard', {'alice': 4500, 'bob': 3200, 'charlie': 4100})
top_players = r.zrevrange('leaderboard', 0, 9, withscores=True)
print(top_players)
Three lines to add scores. One line to get the top 10, already sorted, in under a millisecond. No ORDER BY. No index tuning. No query planner. That was the moment I realized Redis isn't a cache that happens to have data structures. It's a data structure server that happens to be useful as a cache.
That shift in thinking changed how I architect almost everything now.
Redis has been around since 2009 and it's still growing. In the Stack Overflow 2025 Developer Survey, Redis hit 28% usage among professional developers, growing 8% year over year while Docker jumped to 71.1%. That's not a technology coasting on legacy adoption. That's active, accelerating growth.
The business side is even more telling. Redis passed $300 million in annualized recurring revenue with 12,000 paying customers, including a third of the Fortune 100. When a third of the largest companies on the planet pay for something they could technically run for free, the product is doing something right.
And here's a newer stat that surprised me: 43% of developers building AI agents chose Redis for memory and data storage. Redis is showing up in AI agent architectures as the default choice for fast state management. That's a use case nobody predicted five years ago.
The reason Redis rewired my brain is that it maps directly to programming concepts I already understood. Instead of modeling everything as rows and columns and then writing SQL to transform them, Redis lets you work with the data structure that fits the problem.
This is the one that got me. A sorted set is a collection of unique members, each with a floating-point score. Redis keeps them sorted by score automatically. Insertions, removals, and range queries are all O(log N).
Sorted sets are perfect for anything that involves ranking: leaderboards, priority queues, rate limiters using sliding windows, even matchmaking systems in games. Any time you'd reach for ORDER BY score DESC LIMIT 10 in SQL, a sorted set does it faster and with zero query planning overhead.
# Sliding window rate limiter with sorted sets
import time
import redis
r = redis.Redis(host='localhost', port=6379)
def is_rate_limited(user_id: str, max_requests: int = 100, window_seconds: int = 60) -> bool:
key = f"ratelimit:{user_id}"
now = time.time()
window_start = now - window_seconds
pipe = r.pipeline()
pipe.zremrangebyscore(key, 0, window_start) # Remove old entries
pipe.zadd(key, {f"{now}": now}) # Add current request
pipe.zcard(key) # Count requests in window
pipe.expire(key, window_seconds) # Auto-cleanup
results = pipe.execute()
request_count = results[2]
return request_count > max_requests
That's a production-grade sliding window rate limiter in about 15 lines. The pipeline sends all four commands in a single round trip. The sorted set automatically keeps requests ordered by timestamp. Old entries get pruned on every call. Try building this with the same performance characteristics in PostgreSQL. You can, but it's going to involve a lot more code and a lot more latency.
Redis Streams are append-only log structures with consumer groups. Think Kafka, but running on a single Redis instance with zero configuration.
Streams support consumer groups, which means multiple consumers can read from the same stream and Redis tracks what each consumer has processed. Failed messages can be claimed by other consumers. It's event sourcing without the ceremony.
import Redis from 'ioredis'
const redis = new Redis()
// Producer: add events to a stream
async function publishEvent(stream: string, data: Record<string, string>) {
const id = await redis.xadd(stream, '*', ...Object.entries(data).flat())
console.log(`Published event ${id}`)
}
// Consumer: read new events
async function consumeEvents(stream: string, group: string, consumer: string) {
// Create consumer group (ignore if exists)
try {
await redis.xgroup('CREATE', stream, group, '0', 'MKSTREAM')
} catch (e) {
// Group already exists, that's fine
}
while (true) {
const results = await redis.xreadgroup(
'GROUP', group, consumer,
'COUNT', 10,
'BLOCK', 5000,
'STREAMS', stream, '>'
)
if (results) {
for (const [, messages] of results) {
for (const [id, fields] of messages) {
console.log(`Processing ${id}:`, fields)
await redis.xack(stream, group, id)
}
}
}
}
}
I've used this pattern for order processing pipelines. An API endpoint publishes an event to a stream. Three different consumer groups pick it up: one sends a confirmation email, one updates inventory, one fires analytics. Each consumer processes at its own pace. If the email service crashes, the other two keep going, and the email consumer picks up where it left off when it restarts.
Is it Kafka? No. Kafka handles millions of messages per second across distributed clusters. But for most applications doing thousands of events per second, Redis Streams are simpler to run, simpler to debug, and simpler to deploy.
Redis Pub/Sub is fire-and-forget messaging. A publisher sends a message to a channel. Every subscriber listening to that channel gets it immediately. No persistence. No acknowledgments. No replays.
That sounds limiting, and it is, intentionally. Pub/Sub is built for real-time notifications where losing a message isn't catastrophic: chat presence indicators, live scoreboards, cache invalidation across servers, WebSocket fan-out.
# Publisher
import redis
r = redis.Redis(host='localhost', port=6379)
r.publish('notifications', 'user:42:logged_in')
# Subscriber
import redis
r = redis.Redis(host='localhost', port=6379)
pubsub = r.pubsub()
pubsub.subscribe('notifications')
for message in pubsub.listen():
if message['type'] == 'message':
print(f"Received: {message['data'].decode()}")
I use this for cross-server cache invalidation. When one server updates a cached record, it publishes a message. Every other server receives it and drops its local copy. Simple, fast, and it works across any number of servers.
This one is niche but beautiful. HyperLogLog is a probabilistic data structure that counts unique items using fixed 12 KB of memory, no matter how many items you count. The tradeoff is about 0.81% error rate.
import redis
r = redis.Redis(host='localhost', port=6379)
# Track unique visitors
r.pfadd('unique_visitors:2025-07-07', 'user:1', 'user:2', 'user:3')
r.pfadd('unique_visitors:2025-07-07', 'user:1', 'user:4') # user:1 is duplicate
count = r.pfcount('unique_visitors:2025-07-07')
print(f"Unique visitors: {count}") # ~4
You could count 100 million unique visitors and it would still use 12 KB. That's it. Try doing SELECT COUNT(DISTINCT user_id) on a table with 100 million rows and see how your database feels about it.
Let me walk through the patterns I actually use in production. Not hello-world examples. Real patterns.
Yes, caching. But done right, with a proper cache-aside pattern and TTL:
import redis
import json
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
def get_user_profile(user_id: int) -> dict:
cache_key = f"user:{user_id}:profile"
# Check cache first
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss: fetch from database
profile = fetch_from_database(user_id) # Your DB query here
# Store in cache with 5-minute TTL
r.setex(cache_key, 300, json.dumps(profile))
return profile
def invalidate_user_cache(user_id: int):
r.delete(f"user:{user_id}:profile")
The thing most tutorials skip: cache invalidation. You need invalidate_user_cache called everywhere the user profile changes. Miss one spot and users see stale data. This is the "two hard things in computer science" problem and Redis doesn't solve it for you. You still have to think about it.
Session storage is where Redis replaces something clunky (server-side files, database rows) with something clean:
import Redis from 'ioredis'
import { v4 as uuidv4 } from 'uuid'
const redis = new Redis()
interface Session {
userId: string
email: string
role: string
createdAt: string
}
async function createSession(userId: string, email: string, role: string): Promise<string> {
const sessionId = uuidv4()
const session: Session = {
userId,
email,
role,
createdAt: new Date().toISOString(),
}
// Store session with 24-hour TTL
await redis.setex(
`session:${sessionId}`,
86400,
JSON.stringify(session)
)
return sessionId
}
async function getSession(sessionId: string): Promise<Session | null> {
const data = await redis.get(`session:${sessionId}`)
return data ? JSON.parse(data) : null
}
async function destroySession(sessionId: string): Promise<void> {
await redis.del(`session:${sessionId}`)
}
Why Redis and not your database? Two reasons. First, session lookups happen on every single request. That's a lot of reads. Redis handles hundreds of thousands of reads per second without breaking a sweat. Your PostgreSQL connection pool might disagree. Second, sessions are ephemeral. They expire. Redis has built-in TTL. You don't need a cron job to clean up expired sessions.
BullMQ is a Node.js job queue built entirely on Redis Streams. Sidekiq does the same thing for Ruby. Both are battle-tested in production at scale.
import { Queue, Worker } from 'bullmq'
import Redis from 'ioredis'
const connection = new Redis({ maxRetriesPerRequest: null })
// Create a queue
const emailQueue = new Queue('email', { connection })
// Add a job
async function sendWelcomeEmail(userId: string, email: string) {
await emailQueue.add('welcome', {
userId,
email,
template: 'welcome',
}, {
attempts: 3,
backoff: { type: 'exponential', delay: 2000 },
})
}
// Process jobs
const worker = new Worker('email', async (job) => {
const { email, template } = job.data
console.log(`Sending ${template} email to ${email}`)
// Actually send the email here
await sendEmail(email, template)
}, { connection, concurrency: 5 })
worker.on('completed', (job) => {
console.log(`Job ${job.id} completed`)
})
worker.on('failed', (job, err) => {
console.log(`Job ${job?.id} failed: ${err.message}`)
})
BullMQ gives you retries with exponential backoff, concurrency control, job priorities, rate limiting, and a dashboard (Bull Board). All backed by Redis. I've run this in production handling 50,000 jobs per hour on a single Redis instance. It's boring. Nothing breaks. That's the highest compliment I can give infrastructure.
Here's a more complete leaderboard than the three-liner I opened with:
import redis
from typing import Optional
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
class Leaderboard:
def __init__(self, name: str):
self.key = f"leaderboard:{name}"
def update_score(self, player: str, score: float):
"""Set or update a player's score"""
r.zadd(self.key, {player: score})
def increment_score(self, player: str, amount: float = 1):
"""Add to a player's existing score"""
r.zincrby(self.key, amount, player)
def get_rank(self, player: str) -> Optional[int]:
"""Get player's rank (0-indexed, highest score = rank 0)"""
rank = r.zrevrank(self.key, player)
return rank + 1 if rank is not None else None
def get_top(self, count: int = 10) -> list:
"""Get top N players with scores"""
return r.zrevrange(self.key, 0, count - 1, withscores=True)
def get_around(self, player: str, count: int = 5) -> list:
"""Get players around a specific player"""
rank = r.zrevrank(self.key, player)
if rank is None:
return []
start = max(0, rank - count)
end = rank + count
return r.zrevrange(self.key, start, end, withscores=True)
def total_players(self) -> int:
return r.zcard(self.key)
# Usage
lb = Leaderboard('weekly')
lb.update_score('alice', 4500)
lb.update_score('bob', 3200)
lb.increment_score('bob', 150) # Bob now has 3350
print(f"Bob's rank: {lb.get_rank('bob')}")
print(f"Top 10: {lb.get_top(10)}")
print(f"Around Bob: {lb.get_around('bob', 3)}")
print(f"Total players: {lb.total_players()}")
Every operation here is O(log N). With a million players, getting the top 10 takes the same sub-millisecond time as with 100 players. Building this with SQL would mean an indexed ORDER BY query, which is fast but not this fast, and gets slower as the table grows.
This comparison matters because people still ask "why not just use Memcached?" and because the Valkey fork has complicated the decision.
| Feature | Redis | Memcached | Valkey |
|---|---|---|---|
| Data Structures | Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLog | Strings only | Same as Redis |
| Persistence | RDB snapshots + AOF | None | RDB + AOF |
| Replication | Built-in primary/replica | None (use mcrouter) | Built-in |
| Threading | Single-threaded event loop | Multi-threaded | Multi-threaded (new) |
| Pub/Sub | Yes | No | Yes |
| Streams | Yes | No | Yes |
| Lua Scripting | Yes | No | Yes |
| License | SSPL/RSAL + AGPLv3 option | BSD | BSD (3-clause) |
| Performance | 1.5x faster than Valkey | Fast for simple gets/sets | Close to Redis |
| Cloud Default | Azure Cache | AWS ElastiCache (legacy) | AWS ElastiCache (default) |
| Usage (2025) | 28% | Declining | 2.4% |
Memcached is multi-threaded but feature-limited. It's a pure key-value cache, and it does that well. If all you need is get and set for strings, Memcached will happily use all your CPU cores for that. But the moment you need sorted sets, streams, pub/sub, or persistence, you're reaching for Redis or Valkey anyway.
Redis uses a single-threaded event loop, which sounds like a limitation but is actually a feature. No locks. No race conditions. No mutex contention. Every command executes atomically. When you run that rate limiter pipeline from earlier, you know the four commands execute in sequence without another request sneaking in between them.
In March 2024, Redis Labs switched the Redis license from BSD to dual SSPL/RSAL. BSD is about as permissive as open-source licenses get. SSPL and RSAL are... not. The short version: cloud providers can no longer offer Redis-as-a-service without either paying Redis Labs or open-sourcing their entire stack.
The community did not take this well.
The Linux Foundation forked Redis as Valkey under a BSD license, backed by AWS, Google, and Oracle. Within a year, Redis lost most of its external contributors. AWS switched ElastiCache to use Valkey as the default engine. 83% of large companies reportedly adopted or began testing Valkey.
Redis clearly felt the pressure. The original creator, Salvatore Sanfilippo, returned in November 2024. Then in May 2025, Redis 8.0 added AGPLv3 as a license option, which is a recognized open-source license approved by the OSI. Not as permissive as BSD, but open-source in a way SSPL never was.
So what does this mean for you?
If you self-host or use Redis Cloud directly: the license change probably doesn't affect you. RSAL allows internal use. You can run Redis in your own infrastructure, build products on top of it, whatever. The restrictions only hit you if you try to sell Redis itself as a managed service.
If you use AWS, GCP, or Azure: your cloud provider made the choice for you. AWS uses Valkey now. Azure still uses Redis. GCP offers both.
If you're picking for a new project: honestly, both are fine. Redis is still about 1.5x faster than Valkey in benchmarks, but Valkey 8.1 is production-ready and narrowing the gap. Valkey has the BSD license and the cloud providers behind it. Redis has the faster engine and the commercial support.
Here's how I think about this choice now:
Choose Redis if:
Choose Valkey if:
Choose Memcached if:
For most web applications, the Redis-vs-Valkey choice comes down to which cloud you're on. If AWS, you're getting Valkey whether you ask for it or not. If Azure, you're getting Redis. If self-hosting, pick the one your team knows.
Here are the Redis patterns that show up in almost every project I build:
Cache-aside with stampede protection. When a hot cache key expires, every request hits the database simultaneously. I use Redis's SET NX (set if not exists) to acquire a lock. One request rebuilds the cache. Everyone else gets stale data or waits.
import redis
import json
import time
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
def get_with_lock(key: str, ttl: int, fetch_fn):
cached = r.get(key)
if cached:
return json.loads(cached)
lock_key = f"{key}:lock"
if r.set(lock_key, "1", nx=True, ex=5):
# We got the lock, rebuild cache
data = fetch_fn()
r.setex(key, ttl, json.dumps(data))
r.delete(lock_key)
return data
else:
# Someone else is rebuilding, wait briefly
time.sleep(0.1)
cached = r.get(key)
return json.loads(cached) if cached else fetch_fn()
Distributed rate limiting across servers. The sorted set rate limiter from earlier works across any number of application servers because they all talk to the same Redis. No sticky sessions. No shared-nothing complexity. One Redis, one source of truth.
Feature flags. Store them in a Redis hash. Read is sub-millisecond. Update propagates to all servers instantly. No deploy needed.
# Set a feature flag
r.hset('features', 'new_checkout', '1')
r.hset('features', 'dark_mode', '0')
# Check a feature flag (on every request, it's fast enough)
is_enabled = r.hget('features', 'new_checkout') == '1'
Here's my honest opinion after running Redis in production for several years: Redis is underused. Most teams treat it like a simple cache and miss 80% of what it can do.
Every time I see someone spin up a separate service for job queues, or add Kafka for a system that processes 500 events per second, or build a rate limiter using database rows, I think: Redis does all of that. Already. With fewer moving parts.
The sorted set is the most underrated data structure in all of web development. The number of times I've seen teams build complex ranking systems with SQL queries, materialized views, and background jobs, when ZADD and ZREVRANGE would have done the job in two commands, is honestly painful.
That said, I'm not saying "use Redis for everything." Don't store your primary data in Redis. Don't use it as your only database. It's memory-first, which means your data set needs to fit in RAM (or at least the hot portion does). Use PostgreSQL or MySQL for your source of truth. Use Redis for the speed layer on top.
The license situation is messy but stabilizing. The AGPLv3 addition in Redis 8.0 was the right move, even if it came too late to prevent the fork. In practice, if you're not building a competing cloud service, neither license restricts you. And if you are on AWS, Valkey at 2.4% adoption is going to grow fast now that it's the default in ElastiCache.
My actual recommendation: learn the data structures. That's the thing that transfers regardless of whether you end up on Redis or Valkey. Sorted sets, streams, pub/sub, HyperLogLog, these are the building blocks. Once you internalize them, you start seeing problems differently. You stop thinking "how do I query this from my database?" and start thinking "what data structure fits this problem?"
That mental shift is what Redis actually gave me. Not just faster reads. A different way of thinking about data.