Published on

Getting Started with Rust and ClickHouse

Authors
  • avatar
    Name
    James Brown
    Twitter

Cover image

Svix is the enterprise ready webhooks sending service. With Svix, you can build a secure, reliable, and scalable webhook platform in minutes. Looking to send webhooks? Give it a try!

ClickHouse is a columnar, distributed, LSM-based open-source analytical database. Here at Svix, we use it to power dashboards and analytics over our webhook data, allowing our customers to quickly search for messages even if they have tens or hundreds of millions of items in their search corpus. Even though ClickHouse is written in C++1, it's still a great fit for integrating with Rust applications: ClickHouse has a rich type system and it's incredibly fast to crunch huge amounts of data, sustaining many GB/s even on modest hardware.

clickhouse-client scanning at 4.53GBps

In this blog post, we'll walk through a tutorial of using ClickHouse with Rust.

Getting Started

When starting out with ClickHouse and Rust, there are two main choices for client library:

  • clickhouse-rs, the official client
  • klickhouse, an unofficial crate that reverse-engineers the undocumented official binary protocol

Here at Svix, we use the official clickhouse-rs crate, and in the remainder of this post, I'll be using that in my examples. The clickhouse-rs crate wraps the ClickHouse HTTP Interface and uses the very popular reqwest crate under the hood to provide a network transport and connection pooling with an async interface – if you use Rust and HTTP, you're probably already using reqwest!

Connecting to ClickHouse with clickhouse-rs is very straightforward; simply pass a base HTTP URL pointed to your database, a username, password, and optionally a database name and TLS connection information:

let client = clickhouse::Client::default()
    .with_url(url)
    .with_user(user)
    .with_password(password);

Once you have a client, you can execute queries by constructing a clickhouse::Query object from a string.

let value = client
    .query("SELECT 1")
    .fetch_one::<u8>() // <-- type annotation
    .await
    .expect("should fetch");
println!("{}", value);

Note the return type annotation: because you are constructing your SQL from strings, you need to tell Rust about the expected return type. Internally, clickhouse-rs is using the proprietary ClickHouse RowBinary protocol over the wire and the serde library to deserialize the response from ClickHouse. This means that you can easily deserialize complex types into your own structs:

use clickhouse::Row;
use serde::Deserialize;

#[derive(Row, Deserialize, Debug)]
struct DemoRow {
    uint: u64,
    tuple: (String, String),
    array: Vec<u8>,
}

async fn more_types(client: clickhouse::Client) {
    let res = client
        .query("SELECT 1::UInt64 AS uint, tuple('foo', 'bar') AS tuple, [1, 2, 3] AS array")
        .fetch_one::<DemoRow>()
        .await
        .expect("should fetch");
    println!("{res:?}");
}

Of course, it is up to you to make sure that the types in your Rust structs actually match ClickHouse; if you don't, you can look forward to an error like

thread 'main' panicked at .cargo/registry/src/index.crates.io-1949cf8c6b5b557f/clickhouse-0.14.0/src/rowbinary/validation.rs:98:21: While processing column uint: attempting to (de)serialize ClickHouse type UInt128 as u64 which is not compatible

To avoid this, you can use the ch2rs tool to generate Rust structs by introspecting your ClickHouse database.

Query parameters can be bound either on the client side (with .bind()), or the server side (with .param()):

async fn query_parameters(client: clickhouse::Client, key: &str) -> Result<()> {
    // client-side binds are positional and the driver handles escaping
    let client_result = client.query("SELECT id FROM table WHERE key = ?")
        .bind(key)
        .fetch_one::<u64>()
        .await?;
    println!("client-side binding says: {client_result:?}");

    // server-side params are by name and the server handles escaping
    let server_result = client.query("SELECT id FROM table WHERE key = {key:UInt64}")
        .param("key", key)
        .fetch_one::<u64>()
        .await?;
    println!("server-side binding says: {server_result:?}");

    Ok(())
}

Both of these protect you from SQL injection; you should never use format!() or friends to attach user-controller text to a SQL string.

Monitoring

As you roll out ClickHouse, it's important (as with any tool) to keep an eye on performance. clickhouse-rs like most Rust crates, returns Result<T, E> types, so it will slot into your preferred error-handling mechanism2. For performance monitoring, we at Svix are big fans of OpenTelemetry and use a fork of clickhouse-rs which emits OpenTelemetry client spans. This enables us to easily inspect query performance alongside all of our other data in our OpenTelemetry-compatible APM tool.

Surprises and Gotchas

Nullable Values

ClickHouse really doesn't like NULL values. You can make any value nullable by wrapping it in Nullable(T), but performance suffers substantially. Like Go, ClickHouse would prefer to "default-initialize" all values and have them be non-nullable types.

Thankfully, all of the nullable values we would store in ClickHouse here at Svix have an obvious sentinel that we can use to represent null values; for example, all of the strings we would store have no difference between the empty string and the null value, so we can simply use the empty string along with the following serde helper to represent "non-nullable nullable" strings:

pub(crate) mod nnn {
    use std::str::FromStr;

    use serde::{Deserialize, Deserializer, Serialize, Serializer};

    pub(crate) fn serialize<S, T: ToString>(x: &Option<T>, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        x.as_ref()
            .map(|x| x.to_string())
            .unwrap_or_default()
            .serialize(serializer)
    }

    pub(crate) fn deserialize<'de, D, T: FromStr>(deserializer: D) -> Result<Option<T>, D::Error>
    where
        D: Deserializer<'de>,
        <T as FromStr>::Err: std::fmt::Display,
    {
        let s = String::deserialize(deserializer)?;
        if s.is_empty() {
            Ok(None)
        } else {
            let value = s.parse().map_err(serde::de::Error::custom)?;
            Ok(Some(value))
        }
    }
}

Apply this with the annotation #[serde(with = "nnn")] on a Row struct to get the same behavior.

Date / Time Representations

ClickHouse is software of an era. One of the most obvious signs of this is its DateTime type, which stores unsigned seconds since the Unix epoch and thus has a relatively narrow validity window3 and low resolution.

There is a newer DateTime64 type, which encodes variable precision timestamps in a 64-bit integer. For our applications, we find that DateTime64(3, 'UTC') is nearly always the right choice – millisecond precision, and always stored in UTC. This can be encoded and decoded with serde using the clickhouse::serde::chrono::datetime64::millis module. It's especially important to use a wrapper when representing date times because ClickHouse can't parse the common RFC 3339 / ISO8601 serialization format.

Query Construction

Neither of the common query building libraries for Rust (neither diesel nor sqlx / sea_query) support the ClickHouse variant of SQL. You could probably "fake it" using the PostgreSQL driver interface, but the SQL variant is substantially different4 from PostgreSQL's.

Right now, we're using a very basic set of structs to build various query parts and make sure that they get stitched together in the right order with the right bind parameters, but there's lots of room for improvement in this space. Keep an eye on the Svix Github organization for future releases.

UInt128's

Many of our internal IDs are time-sortable UUIDs. Unfortunately, these can't be stored as the native UUID type in ClickHouse right now:

Due to historical reasons, UUIDs are sorted by their second half. UUIDs should therefore not be used directly in a primary key, sorting key, or partition key of a table.

Because of this, we store them as u128's, carefully making sure to encode them with the correct byte-order so that their numeric sort corresponds to their timestamp. ClickHouse does support u128's (under the name UInt128), which is great, but exercise a lot of caution if you use them in queries: the ClickHouse SQL parser doesn't really understand u128's and ends up casting them to a f64. This means that a pair of queries like:

INSERT INTO table(id) VALUES('26164820087713635541294893952713693207');
SELECT * FROM table WHERE id = 26164820087713635541294893952713693207;

will fail because the bare 26164820087713635541294893952713693207 gets interpreted as the IEEE 754 floating point value 2.6164820087713633×1037.

You can work around this by providing type annotations to your queries:

SELECT * FROM table WHERE id = 26164820087713635541294893952713693207::UInt128;

or by passing query input as strings:

SELECT * FROM table WHERE id = '26164820087713635541294893952713693207';

clickhouse-rs works around this by always annotating types when inserting client-side bind values of u128's. Server-side bind values ("parameters") must always have type annotations for all types and are not affected by this issue.


Stay tuned at https://www.svix.com/blog for future posts about our ClickHouse migration! Be sure to follow us on Github or RSS for the latest updates for the Svix webhook service, or join the discussion on our community Slack.

Does this work sound interesting to you? Are you excited about what lies beyond "Getting Started"? Come work with us!

Footnotes

  1. Don't worry, they're working on it

  2. Although do note that any wire-protocol mismatches like incorrect struct types will result in a panic! as opposed to a Result::Err, so it's helpful to have a panic handler installed and appropriately configured, especially during development.

  3. The Unix epoch is January 1, 1970 at 00:00:00Z; ClickHouse (unlike traditional Unixes) uses unsigned 32-bit numbers, so it can represent times until the year 2106, which is a bit better than 2038, but still too close for comfort.

  4. ClickHouse SQL is a fascinating beast! There are the obvious things, like the lack of an UPDATE statement, but some of my favorite unusual features are SELECT ... WITH FILL, PREWHERE, and SAMPLE.