How to Count Unique Users Without Storing Every ID
Counting unique visitors sounds simple. Add each user ID to a collection, then count the size. But what happens when you have millions of visitors? A naive approach stores every single ID, and memory grows linearly with your user count. Ten million visitors means ten million stored IDs, consuming hundreds of megabytes just to produce one number.
Redis offers two approaches to this problem. The first uses Sets to store every ID, giving you exact counts at the cost of memory. The second uses a clever data structure called HyperLogLog (video explainer) that produces approximate counts using a fixed 12KB of memory regardless of whether you have a thousand visitors or a billion. This guide covers both approaches, including how to merge counts across time periods for daily, weekly, and monthly statistics.
Which Redis data types we will use
Set is the intuitive choice for counting unique items. Add each visitor ID to a set, and duplicates are automatically ignored. The SCARD command returns the exact count. Sets work well when you have bounded cardinality, like unique users per page in the last hour. The tradeoff is memory: each ID is stored, so memory grows with your visitor count.
HyperLogLog is a probabilistic data structure designed specifically for cardinality estimation. It answers "approximately how many unique items have I seen?" using a fixed 12KB of memory, no matter how many items you add. The count is approximate with a standard error of 0.81%, meaning if you have 1 million actual visitors, HyperLogLog will report somewhere between 991,900 and 1,008,100. For analytics and dashboards, this accuracy is usually more than enough.
The intuitive approach: store every ID
The straightforward way to count unique visitors is to add each visitor ID to a Redis Set. Sets automatically deduplicate, so adding the same ID twice does not increase the count. When you need the total, SCARD returns the set's size in O(1) time.
This approach gives you exact counts and additional capabilities. You can check if a specific user has visited with SISMEMBER, retrieve all visitor IDs with SMEMBERS, or compute intersections and unions across multiple sets. If you need to know not just how many but also who, sets are the right choice.
The problem is scale. Each user ID stored in a set consumes memory proportional to the ID's length. A million UUIDs (36 characters each) in a set uses roughly 50-70MB of RAM. For a single counter that is expensive, and if you need daily, weekly, and monthly counts across hundreds of pages, memory adds up fast.
- Redis
- Python
- TypeScript
- Go
# Add a visitor to today's set
SADD visitors:2024-01-15 user:abc123
> 1 (added new member)
# Same visitor again, no duplicate
SADD visitors:2024-01-15 user:abc123
> 0 (already exists)
# Add another visitor
SADD visitors:2024-01-15 user:xyz789
> 1
# Get exact unique visitor count
SCARD visitors:2024-01-15
> 2
# Check if a specific user visited
SISMEMBER visitors:2024-01-15 user:abc123
> 1 (yes)
# Set expiration to auto-cleanup old data
EXPIRE visitors:2024-01-15 604800
> 1 (expires in 7 days)
from datetime import date
def record_visitor(client, user_id: str):
"""Record a unique visitor for today."""
key = f'visitors:{date.today()}'
client.sadd(key, user_id)
# Auto-expire after 7 days
client.expire(key, 604800)
def get_unique_visitors(client, day: str) -> int:
"""Get exact count of unique visitors for a day."""
return client.scard(f'visitors:{day}')
def has_visited(client, day: str, user_id: str) -> bool:
"""Check if a specific user visited on a day."""
return client.sismember(f'visitors:{day}', user_id) == 1
function getToday(): string {
return new Date().toISOString().split('T')[0];
}
async function recordVisitor(client: Redis, userId: string) {
const key = `visitors:${getToday()}`;
await client.sadd(key, userId);
// Auto-expire after 7 days
await client.expire(key, 604800);
}
async function getUniqueVisitors(client: Redis, day: string): Promise<number> {
return await client.scard(`visitors:${day}`);
}
async function hasVisited(client: Redis, day: string, userId: string): Promise<boolean> {
return await client.sismember(`visitors:${day}`, userId) === 1;
}
func recordVisitor(ctx context.Context, client *redis.Client, userID string) {
key := "visitors:" + time.Now().Format("2006-01-02")
client.SAdd(ctx, key, userID)
// Auto-expire after 7 days
client.Expire(ctx, key, 7*24*time.Hour)
}
func getUniqueVisitors(ctx context.Context, client *redis.Client, day string) (int64, error) {
return client.SCard(ctx, "visitors:"+day).Result()
}
func hasVisited(ctx context.Context, client *redis.Client, day, userID string) (bool, error) {
return client.SIsMember(ctx, "visitors:"+day, userID).Result()
}
Counting millions with fixed memory
HyperLogLog solves the memory problem by using a probabilistic algorithm. Instead of storing every ID, it maintains a compact sketch that estimates cardinality. The magic is that this sketch stays at 12KB whether you add a thousand items or a trillion.
The PFADD command adds elements to a HyperLogLog, and PFCOUNT returns the estimated cardinality. Under the hood, HyperLogLog hashes each input and uses patterns in the hash values to estimate how many unique items it has seen. The math is fascinating but you do not need to understand it to use it. Just know that the standard error is 0.81%, which means counts are reliably close to accurate.
The tradeoff is that you lose the ability to check membership or retrieve individual items. HyperLogLog only answers "how many unique?" not "was this specific user included?" or "give me the list." If you need those capabilities, stick with sets. If you just need counts for analytics, HyperLogLog saves enormous amounts of memory.
- Redis
- Python
- TypeScript
- Go
# Add a visitor to today's HyperLogLog
PFADD visitors:hll:2024-01-15 user:abc123
> 1 (internal state changed)
# Same visitor again
PFADD visitors:hll:2024-01-15 user:abc123
> 0 (no change, already counted)
# Add more visitors
PFADD visitors:hll:2024-01-15 user:xyz789 user:def456 user:ghi012
> 1
# Get approximate unique visitor count
PFCOUNT visitors:hll:2024-01-15
> 4 (approximately 4 unique visitors)
# Memory usage stays at ~12KB regardless of count
DEBUG OBJECT visitors:hll:2024-01-15
> ... serializedlength:12304 ...
from datetime import date
def record_visitor_hll(client, user_id: str):
"""Record a visitor using HyperLogLog (memory efficient)."""
key = f'visitors:hll:{date.today()}'
client.pfadd(key, user_id)
client.expire(key, 604800)
def get_unique_visitors_hll(client, day: str) -> int:
"""Get approximate count of unique visitors."""
return client.pfcount(f'visitors:hll:{day}')
def record_batch_visitors(client, user_ids: list[str]):
"""Record multiple visitors efficiently."""
key = f'visitors:hll:{date.today()}'
client.pfadd(key, *user_ids)
async function recordVisitorHLL(client: Redis, userId: string) {
const key = `visitors:hll:${getToday()}`;
await client.pfadd(key, userId);
await client.expire(key, 604800);
}
async function getUniqueVisitorsHLL(client: Redis, day: string): Promise<number> {
return await client.pfcount(`visitors:hll:${day}`);
}
async function recordBatchVisitors(client: Redis, userIds: string[]) {
const key = `visitors:hll:${getToday()}`;
await client.pfadd(key, ...userIds);
}
func recordVisitorHLL(ctx context.Context, client *redis.Client, userID string) {
key := "visitors:hll:" + time.Now().Format("2006-01-02")
client.PFAdd(ctx, key, userID)
client.Expire(ctx, key, 7*24*time.Hour)
}
func getUniqueVisitorsHLL(ctx context.Context, client *redis.Client, day string) (int64, error) {
return client.PFCount(ctx, "visitors:hll:"+day).Result()
}
func recordBatchVisitors(ctx context.Context, client *redis.Client, userIDs []string) {
key := "visitors:hll:" + time.Now().Format("2006-01-02")
args := make([]interface{}, len(userIDs))
for i, id := range userIDs {
args[i] = id
}
client.PFAdd(ctx, key, args...)
}
Merging counts across time periods
One of HyperLogLog's most powerful features is merging. You can combine multiple HyperLogLogs into one that represents the union of all their elements, and the merged result correctly deduplicates across all inputs. This lets you compute weekly or monthly unique visitors from daily counts without double-counting users who visited on multiple days.
The PFMERGE command combines multiple HyperLogLogs into a destination key. If a user visited on Monday and Wednesday, they appear in both daily HyperLogLogs, but the merged weekly HyperLogLog counts them only once. This is something you cannot do efficiently with regular sets without keeping all the raw IDs around.
This makes HyperLogLog ideal for time-series unique counts. Keep daily HyperLogLogs for 30 days, merge them on demand to answer "unique visitors this week" or "unique visitors this month." The merged result is still just 12KB and the merge operation is fast.
- Redis
- Python
- TypeScript
- Go
# Daily HyperLogLogs exist for each day
PFCOUNT visitors:hll:2024-01-15
> 1500
PFCOUNT visitors:hll:2024-01-16
> 1800
PFCOUNT visitors:hll:2024-01-17
> 1600
# Merge into a weekly count (deduplicates across days)
PFMERGE visitors:hll:week:2024-03 visitors:hll:2024-01-15 visitors:hll:2024-01-16 visitors:hll:2024-01-17
> OK
# Weekly unique visitors (less than sum because of overlap)
PFCOUNT visitors:hll:week:2024-03
> 3200 (not 4900, many users visited multiple days)
# Can also count multiple keys directly without storing merge
PFCOUNT visitors:hll:2024-01-15 visitors:hll:2024-01-16 visitors:hll:2024-01-17
> 3200
def get_weekly_unique_visitors(client, year: int, week: int) -> int:
"""Get unique visitors for a week by merging daily HyperLogLogs."""
from datetime import datetime, timedelta
# Get the Monday of the given week
jan1 = datetime(year, 1, 1)
start = jan1 + timedelta(weeks=week-1, days=-jan1.weekday())
# Collect keys for each day of the week
keys = [f'visitors:hll:{(start + timedelta(days=i)).strftime("%Y-%m-%d")}'
for i in range(7)]
# PFCOUNT with multiple keys returns deduplicated count
return client.pfcount(*keys)
def create_monthly_rollup(client, year: int, month: int):
"""Create a merged monthly HyperLogLog from daily data."""
from datetime import date
import calendar
days_in_month = calendar.monthrange(year, month)[1]
keys = [f'visitors:hll:{year}-{month:02d}-{day:02d}'
for day in range(1, days_in_month + 1)]
dest_key = f'visitors:hll:month:{year}-{month:02d}'
client.pfmerge(dest_key, *keys)
client.expire(dest_key, 86400 * 365) # Keep for a year
async function getWeeklyUniqueVisitors(client: Redis, startDate: Date): Promise<number> {
// Generate keys for 7 days starting from startDate
const keys: string[] = [];
for (let i = 0; i < 7; i++) {
const day = new Date(startDate);
day.setDate(day.getDate() + i);
keys.push(`visitors:hll:${day.toISOString().split('T')[0]}`);
}
// PFCOUNT with multiple keys returns deduplicated count
return await client.pfcount(...keys);
}
async function createMonthlyRollup(client: Redis, year: number, month: number) {
const daysInMonth = new Date(year, month, 0).getDate();
const keys: string[] = [];
for (let day = 1; day <= daysInMonth; day++) {
keys.push(`visitors:hll:${year}-${String(month).padStart(2, '0')}-${String(day).padStart(2, '0')}`);
}
const destKey = `visitors:hll:month:${year}-${String(month).padStart(2, '0')}`;
await client.pfmerge(destKey, ...keys);
await client.expire(destKey, 86400 * 365);
}
func getWeeklyUniqueVisitors(ctx context.Context, client *redis.Client, startDate time.Time) (int64, error) {
keys := make([]string, 7)
for i := 0; i < 7; i++ {
day := startDate.AddDate(0, 0, i)
keys[i] = "visitors:hll:" + day.Format("2006-01-02")
}
// PFCOUNT with multiple keys returns deduplicated count
return client.PFCount(ctx, keys...).Result()
}
func createMonthlyRollup(ctx context.Context, client *redis.Client, year int, month time.Month) {
daysInMonth := time.Date(year, month+1, 0, 0, 0, 0, 0, time.UTC).Day()
keys := make([]string, daysInMonth)
for day := 1; day <= daysInMonth; day++ {
keys[day-1] = fmt.Sprintf("visitors:hll:%d-%02d-%02d", year, month, day)
}
destKey := fmt.Sprintf("visitors:hll:month:%d-%02d", year, month)
client.PFMerge(ctx, destKey, keys...)
client.Expire(ctx, destKey, 365*24*time.Hour)
}
Choosing an approach
Use Sets when you need exact counts, when you need to check if specific users are included, or when your cardinality is bounded and small enough that memory is not a concern. Sets also work when you need to retrieve the actual list of visitor IDs, not just count them.
Use HyperLogLog when you only need approximate counts and you are dealing with high cardinality. A dashboard showing "~1.2 million unique visitors this month" is just as useful as an exact count, and HyperLogLog produces that number using 12KB instead of hundreds of megabytes. The ability to merge counts across time periods without double-counting makes it especially powerful for analytics.
For many applications, using both makes sense. Keep HyperLogLogs for site-wide analytics where approximate counts are fine and memory savings matter. Use Sets for smaller, bounded counts where you need exact numbers or membership checks, like unique users per feature or per experiment group.