Skip to content

How to cancel AI prompt #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JG-Adams opened this issue Nov 14, 2024 · 6 comments
Closed

How to cancel AI prompt #29

JG-Adams opened this issue Nov 14, 2024 · 6 comments

Comments

@JG-Adams
Copy link

There is a case when I set the context limit too high it fail to deliver but the program is safe and will not crash. However this mean that the asynchronous function is still expecting response from the AI making it require a restart in order to use it! How to handle this?

@jmont-dev
Copy link
Owner

Hey JG-Adams, I hope that you're well. I'll take a look at this when possible; if you can post a short example that would be helpful.

@JG-Adams
Copy link
Author

JG-Adams commented Nov 15, 2024

I'm doing well. Thanks! Hope you're well too!
Here is the scenario:
We use the thread so the main program doesn't have to wait for the Ai.
new_thread = std::thread( [this, response_callback]{
ollama::allow_exceptions(true);
ollama::show_requests(true);
ollama::setReadTimeout(120);
ollama::setWriteTimeout(120);
try{
// get our message together.
request["stream"] = true; // streaming mode
request["options"]["num_ctx"] = contextsize; // Adjustible context limit.
ollama::chat(request, response_callback);

catch(ollama::exception& e) { std::cerr << e.what() << std::endl; }
} );

Main thread check if it's joinable per frame of the program.
We have the ability to generate any number of Ai response as needed.

So in my case I use 8192 as context limit, no problem.
But let's say some poor soul decided to amp it up to 60k. What happens? Well, it's not giving me any error message this time but instead it fill up the ram and kinda just sit there. Meanwhile the thread is waiting for Ollama to finish but it never will. This block the program's ability to do any further operation with Ollama because the thread must first finish before the next prompt can be sent. Hence is why having the ability to cancel would be helpful. (Some way to tell Ollama, "Hey! you know that thing? Nevermind that!")
You try exiting the program with this, not all resource will be deallocated because of that one thread being unable to join.

Fortunately it doesn't hurt the system, it just take time to clean up after itself.

@JG-Adams
Copy link
Author

I managed to solve the issue. But this is what I did.

When the exception get caught, return;
This make the thread come to an end I think.

However I was also using std::atomic busy; to prevent multi request. So only when it's set to false can the prompt go through.
So I did this.
catch(ollama::exception& e) { std::cerr << e.what() << std::endl; busy=false; return; }

This managed to fix the problem and allow the program to be able to use it again and so user can use different setting, program fully clean everything up too.

Ollama actually does cancel the process. It gave me helpful info:
,"model":"llama3.2:3b","options":{"num_ctx":60000},"stream":true}
{"error":"llama runner process has terminated: signal: killed"}
Ollama response returned error: llama runner process has terminated: signal: killed

So this has occurred as a rather interesting bug. I don't know if return; inside of catch is the most appropriate thing in the world. But my reasoning is that it's a function so it should force it to actually reach the end of process. I do know that there appear to be no problem and every translation unit was able to fully exits.

It is desirable to support using no exception too. So that's next.

@jmont-dev
Copy link
Owner

Hey JG-Adams,

I'm adding some hooks to do this gracefully now within the stop_streaming branch. This should be merged shortly. I'll include an example in the Readme for how to do this once it's added. Thanks for testing and pointing out this issue.

@jmont-dev
Copy link
Owner

Support for stopping an active stream has been added with #36 and is included with the latest release. This adds support for the bound function to return a bool that determines whether the response being streamed should continue. See the cases added in test which show how this is used.

I've tested this and confirms it immediately stops generation in Ollama which should prevent wasted resources when a stream is cancelled.

@JG-Adams
Copy link
Author

Awesome! I'm gonna try this. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants