How to Check if a Value Exists Without Storing It

Checking if something exists is one of the most common operations in software. Is this username taken? Have I already processed this webhook? Is this IP address on the blocklist? The straightforward approach is to store every value and check against the list. But when that list grows to millions of entries, memory becomes a problem.

Bloom filters solve this by answering existence questions probabilistically. A Bloom filter can tell you "definitely not in the set" or "probably in the set" using a fraction of the memory that storing every value would require. The tradeoff is false positives: sometimes the filter says "probably yes" when the answer is actually no. But it never produces false negatives: if it says "definitely not," you can trust that completely. This guide covers the standard Set approach for exact checks and Bloom filters for memory-efficient probabilistic checks, plus practical applications like [preventing duplicate work](#preventing-duplicate event-processing) and protecting your cache.

Which Redis data types we will use

Set is the exact approach to existence checking. Add values to a set, check membership with SISMEMBER. You get perfect accuracy but memory grows with each value stored. Sets work well when your data is bounded or when you cannot tolerate any false positives.

Bloom Filter is a probabilistic data structure that uses a fixed amount of memory regardless of how many items you add, up to the capacity you configure. When you add an item, the filter sets several bits based on hash functions. When you check an item, it looks at those same bit positions. If any bit is not set, the item is definitely not in the filter. If all bits are set, the item is probably in the filter but might be a false positive caused by other items setting those same bits.

The intuitive approach: store every value

The obvious way to check if a username is taken is to store every username in a Redis Set. The SISMEMBER command returns 1 if the value exists, 0 if it does not. This gives you perfect accuracy and the ability to enumerate all values if needed.

Sets work well for small to medium collections. A million short strings might consume 50-100MB, which is manageable. But if you need to track billions of values, or you have many separate sets, memory adds up quickly. Sets also require storing the actual values, which might be long URLs, email addresses, or other variable-length strings.

Redis
Python
TypeScript
Go

# Add a username to the set
SADD usernames "alice"
> 1 (added)

# Check if a username is taken
SISMEMBER usernames "alice"
> 1 (exists)

SISMEMBER usernames "bob"
> 0 (does not exist)

# Check memory usage
MEMORY USAGE usernames
> 72 (bytes, grows with each member)

def register_username(client, username: str) -> bool:
    """Try to register a username. Returns True if successful."""
    # SADD returns 1 if the member was added, 0 if it already existed
    added = client.sadd('usernames', username)
    return added == 1

def is_username_taken(client, username: str) -> bool:
    """Check if a username is already registered."""
    return client.sismember('usernames', username) == 1

async function registerUsername(client: Redis, username: string): Promise<boolean> {
  // SADD returns 1 if the member was added, 0 if it already existed
  const added = await client.sadd('usernames', username);
  return added === 1;
}

async function isUsernameTaken(client: Redis, username: string): Promise<boolean> {
  return await client.sismember('usernames', username) === 1;
}

func registerUsername(ctx context.Context, client *redis.Client, username string) (bool, error) {
    // SADD returns 1 if the member was added, 0 if it already existed
    added, err := client.SAdd(ctx, "usernames", username).Result()
    return added == 1, err
}

func isUsernameTaken(ctx context.Context, client *redis.Client, username string) (bool, error) {
    return client.SIsMember(ctx, "usernames", username).Result()
}

Checking membership with a Bloom filter

A Bloom filter uses a bit array and multiple hash functions to track set membership. When you add an item, it hashes the item several times and sets the corresponding bits. When you check an item, it verifies all those bits are set. If any bit is zero, the item was never added. If all bits are one, the item was probably added, but there is a chance other items happened to set all those same bits.

The BF.ADD command adds an item to a Bloom filter, creating the filter if it does not exist. BF.EXISTS checks membership. You can configure the filter's expected capacity and desired false positive rate when creating it with BF.RESERVE. A filter configured for 1 million items at 1% false positive rate uses about 1.2MB, far less than storing a million actual values.

The key insight is that false positives are acceptable in many scenarios. If you are checking username availability, a rare false positive just means telling someone a name is taken when it is not. Mildly annoying but not harmful. If you are deduplicating events, a false positive means occasionally skipping an event you could have processed, which might be fine if your system handles duplicates gracefully anyway.

Redis
Python
TypeScript
Go

# Create a Bloom filter for 1 million items with 1% false positive rate
BF.RESERVE usernames_bloom 0.01 1000000
> OK

# Add a username
BF.ADD usernames_bloom "alice"
> 1 (added)

# Check if a username might exist
BF.EXISTS usernames_bloom "alice"
> 1 (probably exists)

BF.EXISTS usernames_bloom "bob"
> 0 (definitely does not exist)

# Add multiple usernames at once
BF.MADD usernames_bloom "charlie" "dana" "eve"
> [1, 1, 1]

# Check multiple usernames at once
BF.MEXISTS usernames_bloom "alice" "bob" "charlie"
> [1, 0, 1]

def setup_username_filter(client, expected_users: int = 1000000, error_rate: float = 0.01):
    """Create a Bloom filter for username checking."""
    try:
        client.bf().reserve('usernames_bloom', error_rate, expected_users)
    except Exception:
        pass  # Filter already exists

def add_username_to_filter(client, username: str):
    """Add a username to the Bloom filter."""
    client.bf().add('usernames_bloom', username)

def might_username_exist(client, username: str) -> bool:
    """Check if username might exist. False means definitely not, True means maybe."""
    return client.bf().exists('usernames_bloom', username) == 1

def check_username_availability(client, username: str) -> str:
    """Check username availability using Bloom filter as first pass."""
    if not might_username_exist(client, username):
        return "available"  # Definitely available
    else:
        return "maybe_taken"  # Need to verify against actual database

async function setupUsernameFilter(client: Redis, expectedUsers = 1000000, errorRate = 0.01) {
  try {
    await client.call('BF.RESERVE', 'usernames_bloom', errorRate, expectedUsers);
  } catch {
    // Filter already exists
  }
}

async function addUsernameToFilter(client: Redis, username: string) {
  await client.call('BF.ADD', 'usernames_bloom', username);
}

async function mightUsernameExist(client: Redis, username: string): Promise<boolean> {
  const result = await client.call('BF.EXISTS', 'usernames_bloom', username);
  return result === 1;
}

async function checkUsernameAvailability(client: Redis, username: string): Promise<string> {
  if (!await mightUsernameExist(client, username)) {
    return 'available'; // Definitely available
  }
  return 'maybe_taken'; // Need to verify against actual database
}

func setupUsernameFilter(ctx context.Context, client *redis.Client, expectedUsers int64, errorRate float64) {
    client.Do(ctx, "BF.RESERVE", "usernames_bloom", errorRate, expectedUsers)
}

func addUsernameToFilter(ctx context.Context, client *redis.Client, username string) {
    client.Do(ctx, "BF.ADD", "usernames_bloom", username)
}

func mightUsernameExist(ctx context.Context, client *redis.Client, username string) bool {
    result, _ := client.Do(ctx, "BF.EXISTS", "usernames_bloom", username).Int()
    return result == 1
}

func checkUsernameAvailability(ctx context.Context, client *redis.Client, username string) string {
    if !mightUsernameExist(ctx, client, username) {
        return "available" // Definitely available
    }
    return "maybe_taken" // Need to verify against actual database
}

Preventing duplicate event processing

Webhooks and event streams often deliver the same event multiple times. You need to track which events you have already processed to avoid duplicate work. A Bloom filter handles this efficiently when occasional duplicate processing is acceptable.

Before processing an event, check if its ID exists in the filter. If the filter says no, you definitely have not seen it, so process it and add it to the filter. If the filter says yes, skip it. The rare false positive means you might skip an event you have not actually seen, but for many systems this is better than the alternative of storing every event ID forever.

Redis
Python
TypeScript
Go

# Check if we've processed this event
BF.EXISTS events:processed "evt_abc123"
> 0 (definitely not processed)

# Process the event, then mark it as done
BF.ADD events:processed "evt_abc123"
> 1

# Same event arrives again
BF.EXISTS events:processed "evt_abc123"
> 1 (probably processed, skip it)

def process_event_if_new(client, event_id: str, event_data: dict) -> bool:
    """Process an event only if we haven't seen it before."""
    # Check the Bloom filter first
    if client.bf().exists('events:processed', event_id):
        return False  # Probably already processed

    # Process the event
    handle_event(event_data)

    # Mark as processed
    client.bf().add('events:processed', event_id)
    return True

async function processEventIfNew(client: Redis, eventId: string, eventData: object): Promise<boolean> {
  // Check the Bloom filter first
  const seen = await client.call('BF.EXISTS', 'events:processed', eventId);
  if (seen === 1) {
    return false; // Probably already processed
  }

  // Process the event
  await handleEvent(eventData);

  // Mark as processed
  await client.call('BF.ADD', 'events:processed', eventId);
  return true;
}

func processEventIfNew(ctx context.Context, client *redis.Client, eventID string, eventData map[string]interface{}) bool {
    // Check the Bloom filter first
    seen, _ := client.Do(ctx, "BF.EXISTS", "events:processed", eventID).Int()
    if seen == 1 {
        return false // Probably already processed
    }

    // Process the event
    handleEvent(eventData)

    // Mark as processed
    client.Do(ctx, "BF.ADD", "events:processed", eventID)
    return true
}

Protecting your cache from unnecessary lookups

Cache penetration happens when requests repeatedly ask for data that does not exist. Each request misses the cache and hits your database, looking for something that is not there. Attackers can exploit this by requesting random non-existent keys, bypassing your cache entirely.

A Bloom filter containing all valid keys blocks these requests. Before querying the database, check if the key exists in the filter. If the filter says definitely not, return a "not found" immediately without touching the database. This protects your backend from both malicious attacks and legitimate traffic patterns that happen to request many non-existent items.

Redis
Python
TypeScript
Go

# Populate filter with all valid product IDs during startup
BF.MADD products:valid "prod_001" "prod_002" "prod_003"
> [1, 1, 1]

# Request comes in for a product
BF.EXISTS products:valid "prod_001"
> 1 (might exist, check database)

BF.EXISTS products:valid "prod_fake_xyz"
> 0 (definitely doesn't exist, return 404 immediately)

def get_product(client, db, product_id: str) -> dict | None:
    """Get a product, using Bloom filter to prevent cache penetration."""
    # First check if the product could possibly exist
    if not client.bf().exists('products:valid', product_id):
        return None  # Definitely doesn't exist, skip database

    # Check cache
    cached = client.get(f'product:{product_id}')
    if cached:
        return json.loads(cached)

    # Cache miss, query database
    product = db.query_product(product_id)
    if product:
        client.set(f'product:{product_id}', json.dumps(product), ex=3600)
    return product

async function getProduct(client: Redis, db: Database, productId: string): Promise<object | null> {
  // First check if the product could possibly exist
  const mightExist = await client.call('BF.EXISTS', 'products:valid', productId);
  if (mightExist === 0) {
    return null; // Definitely doesn't exist, skip database
  }

  // Check cache
  const cached = await client.get(`product:${productId}`);
  if (cached) {
    return JSON.parse(cached);
  }

  // Cache miss, query database
  const product = await db.queryProduct(productId);
  if (product) {
    await client.set(`product:${productId}`, JSON.stringify(product), 'EX', 3600);
  }
  return product;
}

func getProduct(ctx context.Context, client *redis.Client, db *Database, productID string) (map[string]interface{}, error) {
    // First check if the product could possibly exist
    mightExist, _ := client.Do(ctx, "BF.EXISTS", "products:valid", productID).Int()
    if mightExist == 0 {
        return nil, nil // Definitely doesn't exist, skip database
    }

    // Check cache
    cached, err := client.Get(ctx, "product:"+productID).Result()
    if err == nil {
        var product map[string]interface{}
        json.Unmarshal([]byte(cached), &product)
        return product, nil
    }

    // Cache miss, query database
    product, err := db.QueryProduct(productID)
    if err == nil && product != nil {
        data, _ := json.Marshal(product)
        client.Set(ctx, "product:"+productID, data, time.Hour)
    }
    return product, err
}

Choosing an approach

Use Sets when you need exact answers and your data size is manageable. Sets give you perfect accuracy, the ability to remove items, and the option to list all members. For bounded collections under a few million items, the memory cost is often acceptable.

Use Bloom filters when you have high cardinality, memory is constrained, and false positives are tolerable. Username availability checking, duplicate detection, and cache protection all work well with Bloom filters because the cost of a rare false positive is low. Configure your filter's capacity and error rate based on your expected data size and tolerance for false positives.

Remember that Bloom filters cannot remove items. Once you add something, it is there forever unless you rebuild the filter. If you need deletion support, consider a Cuckoo filter instead, which trades slightly more memory for the ability to remove elements.

How to Check if a Value Exists Without Storing It

Which Redis data types we will use​

The intuitive approach: store every value​

Checking membership with a Bloom filter​

Preventing duplicate event processing​

Protecting your cache from unnecessary lookups​

Choosing an approach​