community[patch]: llama cpp stream generation and abort by invoke method #4942
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi ! Recently, I have been using LangChain to develop my own application.
When using Llama CPP, I noticed that the streaming generation solution mentioned in the documentation cannot be paused.
However, after consulting the official documentation of node-llama-cpp, I found that the streaming and pausing functions can also be achieved using
invoke
: https://withcatai.github.io/node-llama-cpp/guide/chat-session#response-streaming.It only requires a slight modification of the onToken method to resolve this issue. Therefore, I made some changes to the LLM CPP part.
and there is test results:
