Thread Safety in llama.cpp #596

martindevans · 2024-03-12T23:16:58Z

Tracking issue for thread safety in llama.cpp. The global inference lock can be removed once this is resolved.

zsogitbe · 2024-03-13T06:09:03Z

llama.cpp : add pipeline parallelism support #6017. Good news: seems high priority and will probably be ready soon. If this and the CUDA memory release bug correction is ready please add a quick intermediate release integration to LLamaSharp. This is important.

ggml-org/llama.cpp#6017

gospelask · 2025-01-26T02:12:00Z

Great! Great! Great!

martindevans added the Upstream label Mar 12, 2024

aropb mentioned this issue Feb 9, 2025

[BUG]: CUDA errors with two GPUs (multiple parallel requests) #1091

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread Safety in llama.cpp #596

Thread Safety in llama.cpp #596

martindevans commented Mar 12, 2024

zsogitbe commented Mar 13, 2024

gospelask commented Jan 26, 2025

Thread Safety in llama.cpp #596

Thread Safety in llama.cpp #596

Comments

martindevans commented Mar 12, 2024

zsogitbe commented Mar 13, 2024

gospelask commented Jan 26, 2025