Welcome to LiquidCache! 🚀
LiquidCache is a cache server for DataFusion based systems. Simply register LiquidCache as the TableProvider
to enjoy the performance boost.
Depending on your usage, LiquidCache can easily achieve 10x lower cost and latency.
Both LiquidCache and DataFusion run on cloud servers within the same region, but is configured differently:
- LiquidCache often have memory/CPU ratio of 16:1 (e.g., 64GB memory and 4 cores)
- DataFusion often have memory/CPU ratio of 2:1 (e.g., 32GB memory and 16 cores)
Multiple DataFusion nodes share the same LiquidCache through network. Each component can be scaled independently as the workload grows.
Under the hood, LiquidCache transcodes and caches the Parquet data from object store, and evaluates the filters before sending the data to the DataFusion, effectively reducing both CPU utilization and network data transfer on cache servers.
git clone https://github.com/XiangpengHao/liquid-cache.git
cd liquid-cache
cargo run --bin bench_server --release
In a different terminal, run the ClickBench client.
cargo run --bin clickbench_client --release -- --query-path benchmark/queries.sql --file examples/nano_hits.parquet
(note: replace nano_hits.parquet
with real ClickBench dataset)
Checkout the examples
folder for more details. We are working on a crates.io release, stay tuned!
use arrow_flight::flight_service_server::FlightServiceServer;
use liquid_cache_server::LiquidCacheService;
use tonic::transport::Server;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let addr = "0.0.0.0:50051".parse()?;
let liquid_cache = LiquidCacheService::try_new()?;
let flight = FlightServiceServer::new(liquid_cache);
info!("LiquidCache server listening on {addr:?}");
Server::builder().add_service(flight).serve(addr).await?;
Ok(())
}
use std::sync::Arc;
use datafusion::{
error::Result,
prelude::{SessionConfig, SessionContext},
};
use liquid_cache_client::LiquidCacheTableFactory;
use liquid_common::ParquetMode;
use url::Url;
#[tokio::main]
pub async fn main() -> Result<()> {
let mut session_config = SessionConfig::from_env()?;
session_config
.options_mut()
.execution
.parquet
.pushdown_filters = true;
let ctx = Arc::new(SessionContext::new_with_config(session_config));
let entry_point = "http://localhost:50051";
let sql = "SELECT COUNT(*) FROM nano_hits WHERE \"URL\" <> '';";
let table_url = Url::parse("file:///examples/nano_hits.parquet").unwrap();
let table = LiquidCacheTableFactory::open_table(
entry_point,
"nano_hits",
table_url,
ParquetMode::Liquid,
)
.await?;
ctx.register_table("nano_hits", Arc::new(table))?;
ctx.sql(sql).await?.show().await?;
Ok(())
}
See dev/README.md
No. While production-ready is our goal, we are still working on implementing more features and polishing it. LiquidCache starts with a research project -- exploring new approaches to build cost-effective caching systems. Like most research projects, it takes time to mature, and we welcome your help!
LiquidCache is a data cache, it caches logically equivalent but physically different data from object store.
LiquidCaches does not cache query results, it only caches data, allowing the same cache to be used for different queries.
We will use stable Rust once we believe the project is ready for production.
Check out our paper (under submission to VLDB) for more details, in the meanwhile, we are working on a tech blog to introduce LiquidCache in a more human-readable way.
We are always looking for contributors, any feedback/improvement is welcome! Feel free to take a look at the issue list and contribute to the project. If you want to get involved in the research process, feel free to reach out.
LiquidCache is a research project funded by:
- InfluxData
- Taxpayers of the state of Wisconsin and federal government.
As such, LiquidCache is and will always be open source and free to use.
Your support to science is greatly appreciated!