Skip to content

Extend llama_kv_cache_seq_rm to allow matching any sequence #3843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 29, 2023

Conversation

KerfuffleV2
Copy link
Collaborator

@KerfuffleV2 KerfuffleV2 commented Oct 29, 2023

Since llama_seq_id allows for negative values, this extends llama_kv_cache_seq_rm to use sequence ids < 0 to match any sequence. Based on #3840 it doesn't seem like llama_kv_cache_token_rm really can be used for anything other than just clearing the kv cache. Some of the existing uses seem incorrect as well.

This pull is mergeable but incomplete. If it's on the right track, I'll just replace the existing calls to llama_kv_cache_token_rm that just clear the cache (llama_kv_cache_tokens_rm(ctx, -1, -1)) with llama_kv_cache_seq_rm(ctx, -1, -1, -1) and ones like llama_kv_cache_tokens_rm(ctx, n_matching_session_tokens, -1) which seem to actually want a position rather than cell index with llama_kv_cache_seq_rm(ctx, -1, n_matching_session_tokens, -1).

llama_kv_cache_tokens_rm could possibly be removed (I can't see how one would actually use it, where would knowing a kv cell index be meaningful? Maybe something like kv cache defragging if it ever gets added?). Could possibly also add a llama_kv_cache_clear function to clear the KV cache a bit more efficiently than using the new llama_kv_cache_seq_rm method.

Closes #3840

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove llama_kv_cache_tokens_rm and add llama_kv_cache_clear and extend the llama_kv_cache_seq_rm as proposed

Use llama_kv_cache_clear for cache clearing

Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality
@KerfuffleV2
Copy link
Collaborator Author

How about this? Tested, seems to work. Pretty hard change for even me to screw up.

@@ -1492,8 +1487,14 @@ static void llama_kv_cache_seq_rm(
if (p1 < 0) p1 = std::numeric_limits<llama_pos>::max();

for (uint32_t i = 0; i < cache.size; ++i) {
if (cache.cells[i].has_seq_id(seq_id) && cache.cells[i].pos >= p0 && cache.cells[i].pos < p1) {
cache.cells[i].seq_id.erase(seq_id);
if (cache.cells[i].pos >= p0 && cache.cells[i].pos < p1) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth the complexity, but we could optimize to:

static void llama_kv_cache_seq_rm(
        struct llama_kv_cache & cache,
                 llama_seq_id   seq_id,
                    llama_pos   p0,
                    llama_pos   p1) {
    uint32_t new_head = cache.size;

    if (p0 < 0) p0 = 0;
    if (p1 < 0) p1 = std::numeric_limits<llama_pos>::max();

    if (seq_id < 0) {
        if (p0 == 0 && p1 >= cache.size) {
            llama_kv_cache_clear(cache);
            return;
        }
        for (uint32_t i = 0; i < cache.size; ++i) {
            if (cache.cells[i].pos >= p0 && cache.cells[i].pos < p1) {
                cache.cells[i].seq_id.clear();
                cache.cells[i].pos = -1;
                if (new_head == cache.size) new_head = i;
            }
        }
    } else {
        for (uint32_t i = 0; i < cache.size; ++i) {
            if (cache.cells[i].has_seq_id(seq_id) && cache.cells[i].pos >= p0 && cache.cells[i].pos < p1) {
                cache.cells[i].seq_id.erase(seq_id);
                if (cache.cells[i].seq_id.empty()) {
                    cache.cells[i].pos = -1;
                    if (new_head == cache.size) new_head = i;
                }
            }
        }
    }

    // If we freed up a slot, set head to it so searching can start there.
    if (new_head != cache.size) cache.head = new_head;
}

This avoids checking if seq_id < 0 each iteration, but a single int test probably wouldn't be noticeable even for huge KV caches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, either way would be fine. If you change it directly merge or you just merge as it is

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you don't have a preference, I'll just leave it as is. I don't think it's worth the added complexity.

@KerfuffleV2 KerfuffleV2 merged commit 6e08281 into ggml-org:master Oct 29, 2023
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Oct 30, 2023
…#3843)

* Extend llama_kv_cache_seq_rm to allow matichng any sequence

* Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear

Use llama_kv_cache_clear for cache clearing

Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality
@KerfuffleV2 KerfuffleV2 deleted the feat-kv_seq_rm_wildcard branch November 17, 2023 03:12
brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 17, 2023
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
…#3843)

* Extend llama_kv_cache_seq_rm to allow matichng any sequence

* Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear

Use llama_kv_cache_clear for cache clearing

Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality
brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

llama_kv_cache_tokens_rm functioning on index and not position
2 participants