Skip to content

Commit 75ba5ba

Browse files
committed
llama : pad KV cache size to 32
1 parent ef47ec1 commit 75ba5ba

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

llama.cpp

+1-2
Original file line numberDiff line numberDiff line change
@@ -5504,8 +5504,7 @@ static int llama_decode_internal(
55045504
// a heuristic, to avoid attending the full cache if it is not yet utilized
55055505
// after enough generations, the benefit from this heuristic disappears
55065506
// if we start defragmenting the cache, the benefit from this will be more important
5507-
//kv_self.n = std::max(32, GGML_PAD(llama_kv_cache_cell_max(kv_self), 32)); // TODO: this might be better for CUDA?
5508-
kv_self.n = std::min((int32_t) cparams.n_ctx, std::max(32, llama_kv_cache_cell_max(kv_self)));
5507+
kv_self.n = std::min((int32_t) cparams.n_ctx, std::max(32, GGML_PAD(llama_kv_cache_cell_max(kv_self), 32)));
55095508

55105509
//printf("kv_self.n = %5d, kv_self.used = %5d, kv_self.head = %5d\n", kv_self.n, kv_self.used, kv_self.head);
55115510

0 commit comments

Comments
 (0)