To download partitioned dataset (~100MB):
wget https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_0.parquet -O benchmark/data/hits_0.parquet
To download the entire dataset (~15GB):
wget https://datasets.clickhouse.com/hits_compatible/athena/hits.parquet -O benchmark/clickbench/data/hits.parquet
To download the partitioned dataset (100 files, ~150MB each):
for i in (seq 0 99)
wget https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_$i.parquet -O benchmark/clickbench/data/partitioned/hits_$i.parquet
end
Or bash :
for i in {0..99}; do
wget https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_$i.parquet -O benchmark/clickbench/data/partitioned/hits_$i.parquet
done
cargo run --release --bin bench_server
cargo run --release --bin clickbench_client -- --query-path benchmark/clickbench/queries/queries.sql --file benchmark/clickbench/data/hits.parquet
env RUST_LOG=info RUST_BACKTRACE=1 RUSTFLAGS='-C target-cpu=native' cargo run --release --bin bench_server
env RUST_LOG=info RUST_BACKTRACE=1 RUSTFLAGS='-C target-cpu=native' cargo run --release --bin clickbench_client -- --query-path benchmark/clickbench/queries/queries.sql --file benchmark/clickbench/data/hits.parquet --query 42
(make sure you have uv installed)
cd benchmark/tpch
uvx --from duckdb python tpch_gen.py --scale 0.01
cargo run --release --bin bench_server
env RUST_LOG=info,clickbench_client=debug RUSTFLAGS='-C target-cpu=native' cargo run --release --bin tpch_client -- --query-dir benchmark/tpch/queries/ --data-dir benchmark/tpch/data/sf0.1 --iteration 3 --bench-mode liquid-eager-transcode --answer-dir benchmark/tpch/answers/sf0.1
To collect flamegraph from server side, simply add --flamegraph-dir benchmark/data/flamegraph
to the server command, for example:
cargo run --release --bin bench_server -- --flamegraph-dir benchmark/data/flamegraph
It will generate flamegraph for each query that the server executed.
To collect cache stats, simply add --stats-dir benchmark/data/cache_stats
to the server command, for example:
cargo run --release --bin bench_server -- --stats-dir benchmark/data/cache_stats
It will generate a parquet file that contains the cache stats for each query that the server executed.
You can use parquet-viewer
to view the stats in the browser.
RUST_LOG=info RUSTFLAGS='-C target-cpu=native' cargo run --release --bin encoding -- --file benchmark/clickbench/data/hits.parquet --column 2
This will benchmark the encoding time of the URL
column.