Skip to content

Latest commit

 

History

History
96 lines (68 loc) · 3.02 KB

README.md

File metadata and controls

96 lines (68 loc) · 3.02 KB

Benchmark Guide

Download dataset

To download partitioned dataset (~100MB):

wget https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_0.parquet -O benchmark/data/hits_0.parquet

To download the entire dataset (~15GB):

wget https://datasets.clickhouse.com/hits_compatible/athena/hits.parquet -O benchmark/clickbench/data/hits.parquet

To download the partitioned dataset (100 files, ~150MB each):

for i in (seq 0 99)
    wget https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_$i.parquet -O benchmark/clickbench/data/partitioned/hits_$i.parquet
end

Or bash :

for i in {0..99}; do
    wget https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_$i.parquet -O benchmark/clickbench/data/partitioned/hits_$i.parquet
done

Run benchmarks

Minimal

cargo run --release --bin bench_server
cargo run --release --bin clickbench_client -- --query-path benchmark/clickbench/queries/queries.sql --file benchmark/clickbench/data/hits.parquet

Advanced

env RUST_LOG=info RUST_BACKTRACE=1 RUSTFLAGS='-C target-cpu=native' cargo run --release --bin bench_server
env RUST_LOG=info RUST_BACKTRACE=1 RUSTFLAGS='-C target-cpu=native' cargo run --release --bin clickbench_client -- --query-path benchmark/clickbench/queries/queries.sql --file benchmark/clickbench/data/hits.parquet --query 42

TPCH

Generate data

(make sure you have uv installed)

cd benchmark/tpch
uvx --from duckdb python tpch_gen.py --scale 0.01

Run server (same as ClickBench)

cargo run --release --bin bench_server

Run client

env RUST_LOG=info,clickbench_client=debug RUSTFLAGS='-C target-cpu=native' cargo run --release --bin tpch_client -- --query-dir benchmark/tpch/queries/ --data-dir benchmark/tpch/data/sf0.1  --iteration 3 --bench-mode liquid-eager-transcode --answer-dir benchmark/tpch/answers/sf0.1

Profile

Flamegraph

To collect flamegraph from server side, simply add --flamegraph-dir benchmark/data/flamegraph to the server command, for example:

cargo run --release --bin bench_server -- --flamegraph-dir benchmark/data/flamegraph

It will generate flamegraph for each query that the server executed.

Cache stats

To collect cache stats, simply add --stats-dir benchmark/data/cache_stats to the server command, for example:

cargo run --release --bin bench_server -- --stats-dir benchmark/data/cache_stats

It will generate a parquet file that contains the cache stats for each query that the server executed. You can use parquet-viewer to view the stats in the browser.

Run encoding benchmarks

RUST_LOG=info RUSTFLAGS='-C target-cpu=native' cargo run --release --bin encoding -- --file benchmark/clickbench/data/hits.parquet --column 2

This will benchmark the encoding time of the URL column.