llama : add option for greedy sampling with probs #3813

ggerganov · 2023-10-27T13:13:20Z

On master when using temp <= 0.0, we get greedy sampling but we don't have the probs of the tokens.
This PR adds an option when using temp == 0.0, to do greedy sampling but also apply softmax so we get the probs.

KerfuffleV2 · 2023-10-27T14:26:17Z

common/sampling.cpp

@@ -167,9 +167,13 @@ llama_token llama_sampling_sample(
        llama_sample_grammar(ctx_main, &cur_p, ctx_sampling->grammar);
    }

-    if (temp <= 0) {
-        // greedy sampling
+    if (temp < 0.0) {


The result is the same in either case right? I'm not entirely sure it's worth special casing this instead of just changing greedy sampling to do:

llama_sample_softmax(ctx_main, &cur_p); id = cur_p.data[0].id;

But if you did go that way, you'd probably also want to change the common args parsing stuff to clamp the user-specified temperature to 0.0 so if they pass a negative value it's the same.

It's only internal stuff that would care about probs generated vs no probs unless I'm misunderstanding.

It's the same result yes. The probs are not used only internally - we are using them in speculative. Before this PR, we had to do the hack with temp = 0.01f; to get probs. Now we get them with temp = 0.0f;

The user specified input should not be affected by this change. Technically, the user would normally want to pass temp = -1.0f to save the extra softmax compute, but it's probably not something that would affect performance in measurable way.

Sorry, "internal" was a poor choice of words. I meant it's not something someone calling the application and passing --temp on the commandline would care about. So if they do --temp -1 for an example that doesn't care about probs then it's kind of weird/unnecessary to turn on generating probs in that case.

So what I'm proposing is that the argument handling stuff would do something like:

params.sparams.temp = std::max(0.0f, atof(blah));

when parsing the commandline arguments, so even if the user does --temp -1 it's still just 0.0. Then something like speculative which cares about probs in the greedy sampling case can do:

if (params.sparams.temp == 0.0f) { params.sparams.temp = -1.0f; }

edit: Actually, you'd need to reverse the logic for the softmax case a bit also: so 0.0 = greedy sampling, no softmax. < 0.0 = greedy sampling with softmax.

Got it. Should be good now

* llama : add option for greedy sampling with probs * llama : add comment about llama_sample_token_greedy() missing probs * sampling : temp == 0.0 -> no probs, temp < 0.0 -> probs

* ggml-org/llama.cpp#3813

* llama : add option for greedy sampling with probs * llama : add comment about llama_sample_token_greedy() missing probs * sampling : temp == 0.0 -> no probs, temp < 0.0 -> probs

* ggml-org/llama.cpp#3813

ggerganov requested a review from KerfuffleV2 October 27, 2023 13:13

llama : add option for greedy sampling with probs

4aa1fb0

ggerganov force-pushed the sampling-greedy-with-probs branch from e274fe3 to 4aa1fb0 Compare October 27, 2023 13:22

KerfuffleV2 reviewed Oct 27, 2023

View reviewed changes

ggerganov added 2 commits October 28, 2023 13:21

llama : add comment about llama_sample_token_greedy() missing probs

c86cca8

sampling : temp == 0.0 -> no probs, temp < 0.0 -> probs

bbfc62a

KerfuffleV2 approved these changes Oct 28, 2023

View reviewed changes

ggerganov merged commit ee1a0ec into master Oct 28, 2023

berkut1 mentioned this pull request Nov 9, 2023

Greedy sampling oobabooga/text-generation-webui#4524

Closed

1 task

brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 17, 2023

Allow returning probs w/ greedy sampling (negative temp)

638c0ff

* ggml-org/llama.cpp#3813

brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 30, 2023

Allow returning probs w/ greedy sampling (negative temp)

a5ba40d

* ggml-org/llama.cpp#3813

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : add option for greedy sampling with probs #3813

llama : add option for greedy sampling with probs #3813

ggerganov commented Oct 27, 2023

KerfuffleV2 Oct 27, 2023

ggerganov Oct 28, 2023

KerfuffleV2 Oct 28, 2023 •

edited

Loading

ggerganov Oct 28, 2023

llama : add option for greedy sampling with probs #3813

llama : add option for greedy sampling with probs #3813

Conversation

ggerganov commented Oct 27, 2023

KerfuffleV2 Oct 27, 2023

Choose a reason for hiding this comment

ggerganov Oct 28, 2023

Choose a reason for hiding this comment

KerfuffleV2 Oct 28, 2023 • edited Loading

Choose a reason for hiding this comment

ggerganov Oct 28, 2023

Choose a reason for hiding this comment

KerfuffleV2 Oct 28, 2023 •

edited

Loading