-
Notifications
You must be signed in to change notification settings - Fork 11.5k
llama : add option for greedy sampling with probs #3813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result is the same in either case right? I'm not entirely sure it's worth special casing this instead of just changing greedy sampling to do:
But if you did go that way, you'd probably also want to change the common args parsing stuff to clamp the user-specified temperature to
0.0
so if they pass a negative value it's the same.It's only internal stuff that would care about probs generated vs no probs unless I'm misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same result yes. The probs are not used only internally - we are using them in
speculative
. Before this PR, we had to do the hack withtemp = 0.01f;
to get probs. Now we get them withtemp = 0.0f;
The user specified input should not be affected by this change. Technically, the user would normally want to pass
temp = -1.0f
to save the extra softmax compute, but it's probably not something that would affect performance in measurable way.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, "internal" was a poor choice of words. I meant it's not something someone calling the application and passing
--temp
on the commandline would care about. So if they do--temp -1
for an example that doesn't care about probs then it's kind of weird/unnecessary to turn on generating probs in that case.So what I'm proposing is that the argument handling stuff would do something like:
when parsing the commandline arguments, so even if the user does
--temp -1
it's still just0.0
. Then something likespeculative
which cares about probs in the greedy sampling case can do:edit: Actually, you'd need to reverse the logic for the softmax case a bit also: so 0.0 = greedy sampling, no softmax. < 0.0 = greedy sampling with softmax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Should be good now