Add support for stopping generation during a stream. #36

jmont-dev · 2025-03-30T04:21:45Z

Adds support to gracefully stop an active stream using either the generate or chat endpoints. The bound response function has been changed to return a bool which determines whether or not to continue streaming for this response.

When used asynchronously, a simple atomic variable can be used to return false from the calling thread in order to stop a stream launched in another thread. See the cases added in test for an example of doing so.

Chadliu0806 · 2025-04-01T02:30:00Z

Hi jmont-dev

Thanks for quick feedback. The following as my scenario of program.

I package [ollama-hpp] as DLL file.
I coding MFC program to handle these APIs.
I create threading and call ollama::response generate() to inference information.
I added the stop() function in [ollama.hpp], and this stop function will execute this->cli->Post("/api/stop"), but when I execute the stop function, I cannot stop the return of the Restful API immediately. It will wait for about 2~3 seconds and cannot be disconnected immediately. Is this correct?

Thanks a lot.

jmont-dev · 2025-04-02T05:27:59Z

I don't believe /api/stop is a supported endpoint in Ollama's API. You can view all of the available endpoints here: https://github.com/ollama/ollama/blob/main/docs/api.md.

In order to stop a generation, the client interacting with Ollama needs to cancel the request.

I provide an example of how to do so under the test Chat with Asynchronous Interrupted Streaming Response under test/test.cpp. It looks something like this:

std::atomic<bool> done{false};
std::string streamed_response;

bool on_receive_response(const ollama::response& response)
{   
    streamed_response+=response.as_simple_string();
    if (response.as_json()["done"]==true) done=true;

    // If this is true, continue streaming. If this is false, cancel the request and stop.
    return !done;
}

std::function<bool(const ollama::response&)> response_callback = on_receive_response;  
        
ollama::message message("user", "Why is the sky blue?");       
        
std::thread new_thread( [message, response_callback]{ ollama::chat(test_model, message, response_callback, options); } );

unsigned int microsec_waited = 0;

// Interrupt the stream by setting the atomic return variable false after two seconds.
while (!done) { std::this_thread::sleep_for(std::chrono::microseconds(100) ); microsec_waited+=100; if (microsec_waited==2000000) { done.store(true); } }
new_thread.join();

Basically, the new changes allow the bound function when doing a generation to return a bool that specifies whether or not to continue the stream. This gets passed to httplib and can be used to cancel the request with the content receiver, causing Ollama to stop immediately if false is returned. You can use a simple atomic variable from another thread as a switch to stop requests as shown in this example.

This example uses the chat endpoint, but the same technique can be used with the generate call. When the request is cancelled using this method, Ollama will stop immediately and will not take any additional time to finish the generation.

Chadliu0806 · 2025-04-02T09:42:24Z

Hi jmont

Your solution does successfully stop the operation during ollama inference.
Thanks a lot.

jmont-dev added 4 commits March 29, 2025 18:34

Add hooks for stopping requests mid-stream.

3ecd8cd

Update Readme. Don't throw exceptions when user manually stops a stream.

ecabe72

Add tests for asynchronous generation and interrupted streaming.

e72580c

Update single-header include.

3b006dd

jmont-dev merged commit be224a8 into master Mar 30, 2025
1 check passed

This was referenced Mar 30, 2025

Stop inference #34

Closed

How to cancel AI prompt #29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for stopping generation during a stream. #36

Add support for stopping generation during a stream. #36

jmont-dev commented Mar 30, 2025 •

edited

Loading

Chadliu0806 commented Apr 1, 2025

jmont-dev commented Apr 2, 2025

Chadliu0806 commented Apr 2, 2025

Add support for stopping generation during a stream. #36

Add support for stopping generation during a stream. #36

Conversation

jmont-dev commented Mar 30, 2025 • edited Loading

Chadliu0806 commented Apr 1, 2025

jmont-dev commented Apr 2, 2025

Chadliu0806 commented Apr 2, 2025

jmont-dev commented Mar 30, 2025 •

edited

Loading