Skip to main content

How to Check if a Value Exists Without Storing It

Checking if something exists is one of the most common operations in software. Is this username taken? Have I already processed this webhook? Is this IP address on the blocklist? The straightforward approach is to store every value and check against the list. But when that list grows to millions of entries, memory becomes a problem.

Bloom filters solve this by answering existence questions probabilistically. A Bloom filter can tell you "definitely not in the set" or "probably in the set" using a fraction of the memory that storing every value would require. The tradeoff is false positives: sometimes the filter says "probably yes" when the answer is actually no. But it never produces false negatives: if it says "definitely not," you can trust that completely. This guide covers the standard Set approach for exact checks and Bloom filters for memory-efficient probabilistic checks, plus practical applications like [preventing duplicate work](#preventing-duplicate event-processing) and protecting your cache.

Which Redis data types we will use

Set is the exact approach to existence checking. Add values to a set, check membership with SISMEMBER. You get perfect accuracy but memory grows with each value stored. Sets work well when your data is bounded or when you cannot tolerate any false positives.

Bloom Filter is a probabilistic data structure that uses a fixed amount of memory regardless of how many items you add, up to the capacity you configure. When you add an item, the filter sets several bits based on hash functions. When you check an item, it looks at those same bit positions. If any bit is not set, the item is definitely not in the filter. If all bits are set, the item is probably in the filter but might be a false positive caused by other items setting those same bits.

The intuitive approach: store every value

The obvious way to check if a username is taken is to store every username in a Redis Set. The SISMEMBER command returns 1 if the value exists, 0 if it does not. This gives you perfect accuracy and the ability to enumerate all values if needed.

Sets work well for small to medium collections. A million short strings might consume 50-100MB, which is manageable. But if you need to track billions of values, or you have many separate sets, memory adds up quickly. Sets also require storing the actual values, which might be long URLs, email addresses, or other variable-length strings.

# Add a username to the set
SADD usernames "alice"
> 1 (added)

# Check if a username is taken
SISMEMBER usernames "alice"
> 1 (exists)

SISMEMBER usernames "bob"
> 0 (does not exist)

# Check memory usage
MEMORY USAGE usernames
> 72 (bytes, grows with each member)

Checking membership with a Bloom filter

A Bloom filter uses a bit array and multiple hash functions to track set membership. When you add an item, it hashes the item several times and sets the corresponding bits. When you check an item, it verifies all those bits are set. If any bit is zero, the item was never added. If all bits are one, the item was probably added, but there is a chance other items happened to set all those same bits.

The BF.ADD command adds an item to a Bloom filter, creating the filter if it does not exist. BF.EXISTS checks membership. You can configure the filter's expected capacity and desired false positive rate when creating it with BF.RESERVE. A filter configured for 1 million items at 1% false positive rate uses about 1.2MB, far less than storing a million actual values.

The key insight is that false positives are acceptable in many scenarios. If you are checking username availability, a rare false positive just means telling someone a name is taken when it is not. Mildly annoying but not harmful. If you are deduplicating events, a false positive means occasionally skipping an event you could have processed, which might be fine if your system handles duplicates gracefully anyway.

# Create a Bloom filter for 1 million items with 1% false positive rate
BF.RESERVE usernames_bloom 0.01 1000000
> OK

# Add a username
BF.ADD usernames_bloom "alice"
> 1 (added)

# Check if a username might exist
BF.EXISTS usernames_bloom "alice"
> 1 (probably exists)

BF.EXISTS usernames_bloom "bob"
> 0 (definitely does not exist)

# Add multiple usernames at once
BF.MADD usernames_bloom "charlie" "dana" "eve"
> [1, 1, 1]

# Check multiple usernames at once
BF.MEXISTS usernames_bloom "alice" "bob" "charlie"
> [1, 0, 1]

Preventing duplicate event processing

Webhooks and event streams often deliver the same event multiple times. You need to track which events you have already processed to avoid duplicate work. A Bloom filter handles this efficiently when occasional duplicate processing is acceptable.

Before processing an event, check if its ID exists in the filter. If the filter says no, you definitely have not seen it, so process it and add it to the filter. If the filter says yes, skip it. The rare false positive means you might skip an event you have not actually seen, but for many systems this is better than the alternative of storing every event ID forever.

# Check if we've processed this event
BF.EXISTS events:processed "evt_abc123"
> 0 (definitely not processed)

# Process the event, then mark it as done
BF.ADD events:processed "evt_abc123"
> 1

# Same event arrives again
BF.EXISTS events:processed "evt_abc123"
> 1 (probably processed, skip it)

Protecting your cache from unnecessary lookups

Cache penetration happens when requests repeatedly ask for data that does not exist. Each request misses the cache and hits your database, looking for something that is not there. Attackers can exploit this by requesting random non-existent keys, bypassing your cache entirely.

A Bloom filter containing all valid keys blocks these requests. Before querying the database, check if the key exists in the filter. If the filter says definitely not, return a "not found" immediately without touching the database. This protects your backend from both malicious attacks and legitimate traffic patterns that happen to request many non-existent items.

# Populate filter with all valid product IDs during startup
BF.MADD products:valid "prod_001" "prod_002" "prod_003"
> [1, 1, 1]

# Request comes in for a product
BF.EXISTS products:valid "prod_001"
> 1 (might exist, check database)

BF.EXISTS products:valid "prod_fake_xyz"
> 0 (definitely doesn't exist, return 404 immediately)

Choosing an approach

Use Sets when you need exact answers and your data size is manageable. Sets give you perfect accuracy, the ability to remove items, and the option to list all members. For bounded collections under a few million items, the memory cost is often acceptable.

Use Bloom filters when you have high cardinality, memory is constrained, and false positives are tolerable. Username availability checking, duplicate detection, and cache protection all work well with Bloom filters because the cost of a rare false positive is low. Configure your filter's capacity and error rate based on your expected data size and tolerance for false positives.

Remember that Bloom filters cannot remove items. Once you add something, it is there forever unless you rebuild the filter. If you need deletion support, consider a Cuckoo filter instead, which trades slightly more memory for the ability to remove elements.