tool-call: add support for tool-calls using Model Context Protocol #11556

bandoti · 2025-01-31T18:46:42Z

This PR adds support for tool-calls using a --tools switch to llama-cli.

It is currently ⚠Experimental!⚠

To test this, first build llama-cli using something like:

cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Debug -DLLAMA_CURL=ON -DLLAMA_TOOLCALL=ON
cmake --build build --config Debug

Then run a Model Context Protocol:

npm install @modelcontextprotocol/server-everything
npx -y supergateway --stdio "npx -y @modelcontextprotocol/server-everything"

In another terminal, launch llama-cli (remove the --single-turn switch to interact):

./build/bin/llama-cli.exe -c 2048 -ngl 8 -cnv --jinja -m 'C:/Users/bandoti/Downloads/Llama-3.2-3B-Instruct-Q6_K.gguf' --tools "http://localhost:8000/sse" -p "What is one plus nine?" --single-turn

Output:

...

{
    "type": "function",
    "function": {
        "name": "add",
        "description": "Adds two numbers",
        "parameters": {
            "properties": {
                "a": {
                    "description": "First number",
                    "type": "number"
                },
                "b": {
                    "description": "Second number",
                    "type": "number"
                }
            },
            "type": "object"
        }
    }
}
...

user

What is one plus nine?assistant

{"name": "add", "parameters": {"a": 1, "b": 9}}Accepted

The sum of 1 and 9 is 10. [end of text]

And the MCP server output:

[supergateway] New SSE connection from ::1
[supergateway] POST to SSE transport (session 0f6ff484-3557-4972-a8e3-451fd4c69f36)
[supergateway] SSE → Child (session 0f6ff484-3557-4972-a8e3-451fd4c69f36): {"jsonrpc":"2.0","id":1,"method":"initialize","params":{"capabilities":{},"clientInfo":{"name":"llama.cpp","version":"1.0.0"},"protocolVersion":"2024-11-05"}}
[supergateway] Child → SSE: {
  result: {
    protocolVersion: '2024-11-05',
    capabilities: { prompts: {}, resources: [Object], tools: {}, logging: {} },
    serverInfo: { name: 'example-servers/everything', version: '1.0.0' }
  },
  jsonrpc: '2.0',
  id: 1
}
[supergateway] POST to SSE transport (session 0f6ff484-3557-4972-a8e3-451fd4c69f36)
[supergateway] SSE → Child (session 0f6ff484-3557-4972-a8e3-451fd4c69f36): {"jsonrpc":"2.0","method":"notifications/initialized"}
[supergateway] POST to SSE transport (session 0f6ff484-3557-4972-a8e3-451fd4c69f36)
[supergateway] SSE → Child (session 0f6ff484-3557-4972-a8e3-451fd4c69f36): {"jsonrpc":"2.0","id":2,"method":"tools/list"}
[supergateway] Child → SSE: {
  result: {
    tools: [ [Object], [Object], [Object], [Object], [Object], [Object] ]
  },
  jsonrpc: '2.0',
  id: 2
}
[supergateway] POST to SSE transport (session 0f6ff484-3557-4972-a8e3-451fd4c69f36)
[supergateway] SSE → Child (session 0f6ff484-3557-4972-a8e3-451fd4c69f36): {"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"arguments":{"a":1,"b":9},"name":"add"}}
[supergateway] Child → SSE: { result: { content: [ [Object] ] }, jsonrpc: '2.0', id: 3 }
[supergateway] SSE connection closed (session 0f6ff484-3557-4972-a8e3-451fd4c69f36)
[supergateway] Client disconnected (session 0f6ff484-3557-4972-a8e3-451fd4c69f36)

Tasks:

Integrating toolcall support with llama-cli

Add a --tools option to pass in a JSON tools array
Add a --tool-choice option which defaults to "auto" (see this ref)
Add a --tool-parallel switch for parallel tool-calls.
Copy remaining logic from oaicompat_completion_params_parse in utils.hpp into common_chat_apply_template (common.cpp).
Some other grammar changes in the main.cpp algorithm?

Implement toolcall handlers for Model Context Protocol (MCP).

Add C++ types for base MCP messages.
Add C++ types and procedures for Lifecycle phase of MCP protocol.
Implement Stdio transport.
Implement HTTP SSE transport using cURL.
Add base types in common library for abstracting out a tool-call handlers. This should include types/functions for translating between the underlying tool-call implementation (OpenAI style) to other formats (MCP in this case). After the template gets applied in common_chat_apply_template via a call to common_chat_params_init, the resulting prompt member of common_chat_params will contain the JSON-Formatted tool-calls. This should be translated and dispatched to the registered handlers (if one was specified).
Other refactoring to support receiving input from the handlers while simultaneously allowing the users input/interjection between request/response in the handlers.
Add C++ types for MCP utility messages to ping, cancel, and receive progress updates for long-running tool-calls.

bandoti · 2025-02-04T19:43:39Z

@ochafik I am working on adding the tool calls to llama-cli, and at this point I have wired into common_chat_apply_template initial support (from what I can tell) for passing in the templates and tool array/tool_choice.

However, I am needing some advice on how to handle the remaining fields of common_chat_params as returned by common_chat_params_init. It is my basic understanding of this that each time the template gets applied, it needs to relay this back to the sampling parameters so it can get hooked into the main token-processing routine. Is this correct? If so, do I simply need to tokenize/push the grammar triggers like server.cpp? At the moment when common_chat_apply_template is called it returns a string but I can change that by adding an out parameter or something.

Thank you for your work on the core of this feature I am excited to get it working on llama-cli! 😊

ochafik · 2025-02-05T16:10:06Z

Hey @bandoti , sorry for the delay, some quick background questions first:

What use case you have in mind for this, is it to treat the cli as a single shot server?
How would you display the output of the tool calls to make it useable (in openai format?). Could you add an example output to the PR description?

Have you considered going directly one step further and have the CLI call tools? @brucepro is looking into doing tool call w/ MCP servers from the server's Web UI (ref), maybe you could join forces / do the same in C++ w/ CURL).

bandoti · 2025-02-05T16:10:43Z

@ochafik I got this working now in llama-cli now. Here's the command I ran followed by the output:

 ./build/bin/llama-cli.exe -c 2048 -ngl 8 -cnv --jinja -m 'C:/Users/mtmcp/Downloads/Llama-3.2-3B-Instruct-Q6_K.gguf' --tools '[
    {
      "type":"function",
      "function":{
        "name":"get_current_weather",
        "description":"Get the current weather in a given location",
        "parameters":{
          "type":"object",
          "properties":{
            "location":{
              "type":"string",
              "description":"The city and state, e.g. San Francisco, CA"
            }
          },
          "required":["location"]
        }
      }
    }
  ]'


system

Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 05 Feb 2025

You have access to the following functions. To call a function, please respond with JSON for a function call.Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.

{
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}

You are a helpful assistant


> What is the weather like in Mumbai?
{"name": "get_current_weather", "parameters": {"location": "Mumbai"}}

>
llama_perf_sampler_print:    sampling time =       1.41 ms /    36 runs   (    0.04 ms per token, 25477.71 tokens per second)
llama_perf_context_print:        load time =    1731.11 ms
llama_perf_context_print: prompt eval time =   17904.77 ms /   204 tokens (   87.77 ms per token,    11.39 tokens per second)
llama_perf_context_print:        eval time =    1457.84 ms /    18 runs   (   80.99 ms per token,    12.35 tokens per second)
llama_perf_context_print:       total time =   29930.62 ms /   222 tokens
Interrupted by user

bandoti · 2025-02-05T16:19:51Z

Hey @bandoti , sorry for the delay, some quick background questions first:
* What use case you have in mind for this, is it to treat the cli as a single shot server?

* How would you display the output of the tool calls to make it useable (in openai format?). Could you add an example output to the PR description?
Have you considered going directly one step further and have the CLI call tools? @brucepro is looking into doing tool call w/ MCP servers from the server's Web UI (ref), maybe you could join forces / do the same in C++ w/ CURL).

@ochafik Good timing we responded at the exact same time haha. No worries on the delay—here's some general objectives:

Testability. Having llama-cli being able to process these function calls can lend for some really useful automated tests using tools like expect &co. This can quickly validate logic in the function-call behavior.
I actually have been working on an on-going effort to wrap llama-cli in a Tcl scripting environment, and the general idea here is that these function calls could be extremely interesting way to create automation.

In both of these cases, the output can be processed and simply scanned for a valid JSON result. If it's valid, then honor the function calls otherwise just print to the console.

bandoti · 2025-02-05T16:36:46Z

I will track the MCP protocol work it sounds interesting! I still think there's a lot of need for local-only tools however, and want to ensure these features are workable/testable without standing up endpoints and such. 😊

When you mention adding this capability in cURL, how do you mean? Setting up llama-cli as a MCP client?

EDIT: After reading more on MCP I see the potential flow, where the AI runs and communicates with the resource services. I'd imagine building that on top of the changes here would work well. A series of services can simply be passed into the llama-cli and it could dispatch to them when it needs something (at least that's how I'm understanding it).

brucepro · 2025-02-05T16:59:05Z

I will track the MCP protocol work it sounds interesting! I still think there's a lot of need for local-only tools however, and want to ensure these features are workable/testable without standing up endpoints and such. 😊

When you mention adding this capability in cURL, how do you mean? Setting up llama-cli as a MCP client?

For MCP, I am adding the SSE client support into the webui. This link was the best example I found: https://github.com/apify/tester-mcp-client/blob/main/src/mcpClient.ts
Then you can run one of the proxy's that allows you to use MCP's servers directly. This one seemed promising. https://github.com/punkpeye/mcp-proxy/ although I think writing a python solution to handle the SSE api calls and just using the python sdk directly is where I will end up. https://github.com/modelcontextprotocol So in the end will have the WebUI able to add any SSE server with a congif of

{
  "mcpServers": {
    "fetch": {
      "name": "Fetch",
      "type": "sse",
      "serverUrl": "http://localhost:8765/sse"
    }
  }
}

Still in progress. Once I hit debug mode will update my repo and start testing.

bandoti · 2025-02-05T17:16:48Z

@brucepro thanks for the info on this. It seems to me, in general, a protocol like this is the way to go for the local AI in llama-cli to invoke actions as well. I'll take a closer look and see what it'll take to add it.

bandoti · 2025-02-05T19:02:24Z

@ochafik As I understand it the requirement to get this working I need to add a "translation" layer between the models OpenAI function call request/response and MCP, correct? This shouldn't be too difficult with cURL and the json library.

I really like the discovery aspect of the MCP protocol—will make managing a collection of functionality much easier.

So I will start working on it as I think this is an important part of the function call API. We can revisit the other aspects of MCP like prompts and the like—those are very powerful as well, albeit that's a fair amount of work so will have to be done gradually.

brucepro · 2025-02-11T04:53:50Z

@ochafik As I understand it the requirement to get this working I need to add a "translation" layer between the models OpenAI function call request/response and MCP, correct? This shouldn't be too difficult with cURL and the json library.

I really like the discovery aspect of the MCP protocol—will make managing a collection of functionality much easier.

So I will start working on it as I think this is an important part of the function call API. We can revisit the other aspects of MCP like prompts and the like—those are very powerful as well, albeit that's a fair amount of work so will have to be done gradually.

Did you get

@ochafik As I understand it the requirement to get this working I need to add a "translation" layer between the models OpenAI function call request/response and MCP, correct? This shouldn't be too difficult with cURL and the json library.

I really like the discovery aspect of the MCP protocol—will make managing a collection of functionality much easier.

So I will start working on it as I think this is an important part of the function call API. We can revisit the other aspects of MCP like prompts and the like—those are very powerful as well, albeit that's a fair amount of work so will have to be done gradually.

Did you make any progress on the cli mcp? I have a super basic React App made that seems to work with llamacpp here. https://github.com/brucepro/llamacppMCPClientDemo I tested with llama3.3 70b but not much else. Will be adding prompts and resources next and debugging. Once it is cleaned up, I will work on migrating it to the WebUI.

bandoti · 2025-02-11T11:18:02Z

@brucepro I'm currently working on adding the types for MCP protocol and initialization handshake. I have all the types defined just going to add unit test on them today.

Working in a different branch but I'll merge that piece in hopefully today.

I added a checklist in the PR description above to track these changes. 😊

examples/main/main.cpp

bandoti · 2025-03-04T18:23:42Z

@CISC I merged in your --single-turn changes here. Thanks for adding that, as it works well for the toolcall case. When using a base prompt it didn't make sense to apply the chat templates, so this gives a means to invoke toolcalls non-interactively.

If you would like to test the tool-calls and report any issues, I would also be most grateful. Please see the instructions above in the PR description. With the --single-turn option I ran a quick test like:
Console 1:
npx -y supergateway --stdio "npx -y @modelcontextprotocol/server-everything"

Console 2:
./build/bin/llama-cli.exe -c 2048 -ngl 8 -cnv --jinja -m 'C:/Users/bandoti/Downloads/Llama-3.2-3B-Instruct-Q6_K.gguf' --tools "http://localhost:8000/sse" -p "What is one plus nine?" --single-turn

Output:

...

{
    "type": "function",
    "function": {
        "name": "add",
        "description": "Adds two numbers",
        "parameters": {
            "properties": {
                "a": {
                    "description": "First number",
                    "type": "number"
                },
                "b": {
                    "description": "Second number",
                    "type": "number"
                }
            },
            "type": "object"
        }
    }
}
...

user

What is one plus nine?assistant

{"name": "add", "parameters": {"a": 1, "b": 9}}Accepted

The sum of 1 and 9 is 10. [end of text]

Communication with MCP server:

[supergateway] New SSE connection from ::1
[supergateway] POST to SSE transport (session 0f6ff484-3557-4972-a8e3-451fd4c69f36)
[supergateway] SSE → Child (session 0f6ff484-3557-4972-a8e3-451fd4c69f36): {"jsonrpc":"2.0","id":1,"method":"initialize","params":{"capabilities":{},"clientInfo":{"name":"llama.cpp","version":"1.0.0"},"protocolVersion":"2024-11-05"}}
[supergateway] Child → SSE: {
  result: {
    protocolVersion: '2024-11-05',
    capabilities: { prompts: {}, resources: [Object], tools: {}, logging: {} },
    serverInfo: { name: 'example-servers/everything', version: '1.0.0' }
  },
  jsonrpc: '2.0',
  id: 1
}
[supergateway] POST to SSE transport (session 0f6ff484-3557-4972-a8e3-451fd4c69f36)
[supergateway] SSE → Child (session 0f6ff484-3557-4972-a8e3-451fd4c69f36): {"jsonrpc":"2.0","method":"notifications/initialized"}
[supergateway] POST to SSE transport (session 0f6ff484-3557-4972-a8e3-451fd4c69f36)
[supergateway] SSE → Child (session 0f6ff484-3557-4972-a8e3-451fd4c69f36): {"jsonrpc":"2.0","id":2,"method":"tools/list"}
[supergateway] Child → SSE: {
  result: {
    tools: [ [Object], [Object], [Object], [Object], [Object], [Object] ]
  },
  jsonrpc: '2.0',
  id: 2
}
[supergateway] POST to SSE transport (session 0f6ff484-3557-4972-a8e3-451fd4c69f36)
[supergateway] SSE → Child (session 0f6ff484-3557-4972-a8e3-451fd4c69f36): {"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"arguments":{"a":1,"b":9},"name":"add"}}
[supergateway] Child → SSE: { result: { content: [ [Object] ] }, jsonrpc: '2.0', id: 3 }
[supergateway] SSE connection closed (session 0f6ff484-3557-4972-a8e3-451fd4c69f36)
[supergateway] Client disconnected (session 0f6ff484-3557-4972-a8e3-451fd4c69f36)

@ochafik /cc @brucepro /cc

brucepro · 2025-03-04T20:40:53Z

Wasn't able to get it working on my windows system, it compiled using my win64devkit, but then when I called the --tools using the sse server it just halted with no output. Using it without was just fine. Will debug a bit today after I move to my linux system.

bandoti · 2025-03-04T21:00:50Z

@brucepro Sounds good please let me know if I can help. At least we know the client is crashing server (they're communicating)! 😅

CISC · 2025-03-05T08:11:42Z

@CISC I merged in your --single-turn changes here. Thanks for adding that, as it works well for the toolcall case. When using a base prompt it didn't make sense to apply the chat templates, so this gives a means to invoke toolcalls non-interactively.

Great!

As far as I see this relies on the model outputting OAI compatible JSON responses, right? So models that don't conform (or can't be properly coerced) to that might have issues.

There's a (currently paused) PR over at transformers that will add tool call parsing using jinja with an inverse template (I have a placeholder draft PR here), which will make it easy to handle mixed responses as well as non-JSON (code).

If you would like to test the tool-calls and report any issues, I would also be most grateful. Please see the instructions above in the PR description.

I'll set it up and run some tests this week. :)

bandoti · 2025-03-05T15:15:56Z

@brucepro, @CISC There is currently a logical issue with how llama-cli is handling AI response in the templates. I need to update this to use common_chat_parse (as it's doing in server.cpp:to_json_oaicompat_chat) in order to gain access to the underlying tool_call members. Please stay tuned I'll let you know when this change is integrated.

There's a (currently paused) PR over at transformers that will add tool call parsing using jinja with an inverse template (I have a placeholder draft PR here), which will make it easy to handle mixed responses as well as non-JSON (code).

Have you taken a look at the minja templating? Just want to make sure no duplicate effort is happening. @ochafik might have already completed this logic and it should be working with server already.

In general, what I've been doing in this PR is create a new MCP client (currently only supports tool-calls but we can later add resources and prompts); translating toolcalls from OpenAI format to MCP format; and porting over the changes that are happening on server. Come to think of it, after calling common_chat_parse the OpenAI compatibility could probably be skipped when dispatching function calls as it can be directly converted to an MCP function-call.

CISC · 2025-03-05T16:51:18Z

Have you taken a look at the minja templating? Just want to make sure no duplicate effort is happening. @ochafik might have already completed this logic and it should be working with server already.

As inverse templates are not a thing yet minja does not support it, but the hope is certainly that it will be able to once ready. :)

In case it wasn't clear we are talking about parsing responses from the model, having an inverse template means we should be able to structurally recreate the chat messages, response (with the emitted tool calls properly denoted) and all from the rendered conversation.

commit 7adfa18 Author: Mason M <[email protected]> Date: Thu Mar 6 17:19:09 2025 -0400 Re-Prompt after toolcall commit c8843da Author: Mason M <[email protected]> Date: Thu Mar 6 13:41:45 2025 -0400 Use format to extract toolcalls

bandoti · 2025-03-07T01:53:35Z

@CISC, @brucepro I fixed the tool-call formatting/calling now it "works" with various models/templates. For some reason it's looping over-and-over though (keeps calling the function). I'm not exactly sure why, but it could be due to how it's re-prompting with tool output. Main improvement though is that the tool-calls are being invoked properly (for the most part—one model passes strings instead of ints to the "add" function, and it crashes the client because the MCP error response not being handled at the moment).

I am tied up for next couple days so won't be able to work on it, but if anyone wants to take a crack at solving the issue a first spot to step in the debugger is the chat_formatter::result chat_formatter::operator() (const std::string & role, const std::string & content) method (in main.cpp). This is a functor stored in chat_add_and_format variable. 😉 I will be on to answer quick questions if they come up though.

@ochafik /cc

brucepro · 2025-03-08T22:15:49Z

Sorry to take so long to get back. Issue on windows build box was curl dev lib. On my ubuntu system all worked well according to your instructions. So maybe a reminder to add the curl dev lib

bandoti · 2025-03-09T00:40:14Z

@ochafik, @brucepro, @CISC, @ngxson, @ggerganov
After stepping back and taking a look at the bigger picture here I realise that the Model Context Protocol probably is not a good fit for llama-cli (still good for llama.cpp though) and should simply exist as a separate library, acting as a middleware between all the mcp-server-exposed resources and the application-level LLM.

While a CLI certainly provides a good user experience, and in fact we COULD directly inject it into llama-cli's main loop (as I've been working through here), I feel that is actually out of scope for that application. In our case, a new CLI application would instead call "llama-server" for all the LLM sampling requests, and so-forth. And it would pull resources/tools from its connected MCP servers. And those tools could also request sampling from the application, which it would route to llama-server, and so-forth.

What I'm trying to say is that I am thinking to instead move this to a repo outside of llama.cpp and simply make it a C library for MCP clients/servers (because really, it should work with llama.cpp and any other providers).

So in general, applications will use llama-server, and they will link against the MCP library. The library will provide hooks for the user to accept/reject requests by tools to access sampler, and provide the notication mechanism when prompts change, et cetera. This will provide the intended separation like their TypeScript/Python SDK but in a portable C API. 😊

I am welcome to suggestions if anyone sees another possibility here, but this seems the most straightforward outcome of the proof-of-concept.

CISC · 2025-03-09T11:33:32Z

SGTM, but I think it would still be useful for llama-cli to interface with MCP as well.

bandoti · 2025-03-09T13:23:25Z

SGTM, but I think it would still be useful for llama-cli to interface with MCP as well.

We can get there but there will have to be some significant refactors. For example, MCP servers can send sampling requests to the client that would occur in parallel. We will have to support that in llama-cli as well, to dispatch requests locally.

So, the capabilities ARE there, but is it in scope to make those changes to llama-cli?

bandoti · 2025-03-10T15:34:03Z

@brucepro I am going to keep pursuing adding parallel support to get full MCP functionality locally (over time—for now I think all we can do is basic tool-calls).

I realize the whole notion of dispatching to llama server between MCP is what you're working on in Python/Typescript. So I don't want to duplicate effort there! But I would like a fully working MCP library in C/C++. Will get there eventually.

brucepro · 2025-03-10T16:57:36Z

Great. About 80% done in webui. Lots of little things to work out such as getting the resources I to additional context, checking props to make sure supported. Having a subprocess to run prompts. Tools works pretty well although the agent loves the echo function too much.
Should be ready for real testing in a day or so.

Add tools option to llama-cli

183029d

github-actions bot added the examples label Jan 31, 2025

bandoti added 7 commits January 31, 2025 17:57

tools_json_arr now properly passed to apply-template

4ad8258

Merge branch 'master' into llamacli-tools

352f79c

add tool-choice parameter

becf9b4

Add variant include

cd16957

Reset tools when empty string provided

4e8beb0

Pass template group to common_chat_apply_template

3437080

Merge branch 'master' into llamacli-tools

36c2f38

bandoti requested a review from ngxson as a code owner February 4, 2025 19:06

github-actions bot added testing Everything test related server labels Feb 4, 2025

bandoti and others added 2 commits February 5, 2025 08:48

Merge branch 'ggerganov:master' into llamacli-tools

a30111b

Copy sampler parameters from chat template

a726ada

ochafik self-requested a review February 5, 2025 17:09

bandoti added 5 commits February 12, 2025 10:09

Merge branch 'master' into llamacli-tools

a024747

Add handler and MCP message types

1dd2e3b

Merge branch 'master' into llamacli-tools

6458c71

Comment out unused parameters

b41f57c

Remove tabs

e7efd7c

Ensure toolcalls are registered when no -sys provided

f354ff9

bandoti commented Mar 2, 2025

View reviewed changes

examples/main/main.cpp Outdated Show resolved Hide resolved

ochafik mentioned this pull request Mar 4, 2025

main: allow preloading conversation with -p and add -st / --single-turn #12145

Merged

bandoti added 2 commits March 4, 2025 12:31

Merge branch 'master' into llamacli-tools

34697cd

Add toolcall output after single-turn run

8871c8d

bandoti added 2 commits March 5, 2025 10:26

Merge branch 'master' into llamacli-tools

95ed663

Update grammar_trigger processing

46766c1

bandoti added 4 commits March 5, 2025 12:54

WIP: use common_chat_parse for toolcall

ac1fc31

Extract toolcall format from model

ba098af

Oops

787fa89

Squashed commit of the following:

c36c7e6

commit 7adfa18 Author: Mason M <[email protected]> Date: Thu Mar 6 17:19:09 2025 -0400 Re-Prompt after toolcall commit c8843da Author: Mason M <[email protected]> Date: Thu Mar 6 13:41:45 2025 -0400 Use format to extract toolcalls

bandoti added 4 commits March 10, 2025 10:25

Merge branch 'master' into llamacli-tools

b25fc0d

Sync trigger-token fix ggml-org#12291

f5c209f

Clear assistant_ss before returning control to loop

4e378fb

Revert changes to common_chat_format_single

ff18e24

bandoti added 2 commits March 17, 2025 09:17

Merge branch 'master' into llamacli-tools

1e67578

Merge branch 'master' into llamacli-tools

ee2dad2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tool-call: add support for tool-calls using Model Context Protocol #11556

tool-call: add support for tool-calls using Model Context Protocol #11556

bandoti commented Jan 31, 2025 •

edited

Loading

bandoti commented Feb 4, 2025

ochafik commented Feb 5, 2025

bandoti commented Feb 5, 2025

bandoti commented Feb 5, 2025

bandoti commented Feb 5, 2025 •

edited

Loading

brucepro commented Feb 5, 2025

bandoti commented Feb 5, 2025

bandoti commented Feb 5, 2025 •

edited

Loading

brucepro commented Feb 11, 2025

bandoti commented Feb 11, 2025

bandoti commented Mar 4, 2025

brucepro commented Mar 4, 2025

bandoti commented Mar 4, 2025 •

edited

Loading

CISC commented Mar 5, 2025

bandoti commented Mar 5, 2025

CISC commented Mar 5, 2025

bandoti commented Mar 7, 2025

brucepro commented Mar 8, 2025

bandoti commented Mar 9, 2025

CISC commented Mar 9, 2025

bandoti commented Mar 9, 2025

bandoti commented Mar 10, 2025

brucepro commented Mar 10, 2025

tool-call: add support for tool-calls using Model Context Protocol #11556

Are you sure you want to change the base?

tool-call: add support for tool-calls using Model Context Protocol #11556

Conversation

bandoti commented Jan 31, 2025 • edited Loading

Tasks:

bandoti commented Feb 4, 2025

ochafik commented Feb 5, 2025

bandoti commented Feb 5, 2025

bandoti commented Feb 5, 2025

bandoti commented Feb 5, 2025 • edited Loading

brucepro commented Feb 5, 2025

bandoti commented Feb 5, 2025

bandoti commented Feb 5, 2025 • edited Loading

brucepro commented Feb 11, 2025

bandoti commented Feb 11, 2025

bandoti commented Mar 4, 2025

brucepro commented Mar 4, 2025

bandoti commented Mar 4, 2025 • edited Loading

CISC commented Mar 5, 2025

bandoti commented Mar 5, 2025

CISC commented Mar 5, 2025

bandoti commented Mar 7, 2025

brucepro commented Mar 8, 2025

bandoti commented Mar 9, 2025

CISC commented Mar 9, 2025

bandoti commented Mar 9, 2025

bandoti commented Mar 10, 2025

brucepro commented Mar 10, 2025

bandoti commented Jan 31, 2025 •

edited

Loading

bandoti commented Feb 5, 2025 •

edited

Loading

bandoti commented Feb 5, 2025 •

edited

Loading

bandoti commented Mar 4, 2025 •

edited

Loading