-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tool-call: add support for tool-calls using Model Context Protocol #11556
base: master
Are you sure you want to change the base?
Conversation
@ochafik I am working on adding the tool calls to llama-cli, and at this point I have wired into However, I am needing some advice on how to handle the remaining fields of Thank you for your work on the core of this feature I am excited to get it working on llama-cli! 😊 |
Hey @bandoti , sorry for the delay, some quick background questions first:
Have you considered going directly one step further and have the CLI call tools? @brucepro is looking into doing tool call w/ MCP servers from the server's Web UI (ref), maybe you could join forces / do the same in C++ w/ CURL). |
@ochafik I got this working now in llama-cli now. Here's the command I ran followed by the output:
|
@ochafik Good timing we responded at the exact same time haha. No worries on the delay—here's some general objectives:
In both of these cases, the output can be processed and simply scanned for a valid JSON result. If it's valid, then honor the function calls otherwise just print to the console. |
I will track the MCP protocol work it sounds interesting! I still think there's a lot of need for local-only tools however, and want to ensure these features are workable/testable without standing up endpoints and such. 😊 When you mention adding this capability in cURL, how do you mean? Setting up llama-cli as a MCP client? EDIT: After reading more on MCP I see the potential flow, where the AI runs and communicates with the resource services. I'd imagine building that on top of the changes here would work well. A series of services can simply be passed into the llama-cli and it could dispatch to them when it needs something (at least that's how I'm understanding it). |
For MCP, I am adding the SSE client support into the webui. This link was the best example I found: https://github.com/apify/tester-mcp-client/blob/main/src/mcpClient.ts
Still in progress. Once I hit debug mode will update my repo and start testing. |
@brucepro thanks for the info on this. It seems to me, in general, a protocol like this is the way to go for the local AI in llama-cli to invoke actions as well. I'll take a closer look and see what it'll take to add it. |
@ochafik As I understand it the requirement to get this working I need to add a "translation" layer between the models OpenAI function call request/response and MCP, correct? This shouldn't be too difficult with cURL and the json library. I really like the discovery aspect of the MCP protocol—will make managing a collection of functionality much easier. So I will start working on it as I think this is an important part of the function call API. We can revisit the other aspects of MCP like prompts and the like—those are very powerful as well, albeit that's a fair amount of work so will have to be done gradually. |
Did you get
Did you make any progress on the cli mcp? I have a super basic React App made that seems to work with llamacpp here. https://github.com/brucepro/llamacppMCPClientDemo I tested with llama3.3 70b but not much else. Will be adding prompts and resources next and debugging. Once it is cleaned up, I will work on migrating it to the WebUI. |
@brucepro I'm currently working on adding the types for MCP protocol and initialization handshake. I have all the types defined just going to add unit test on them today. Working in a different branch but I'll merge that piece in hopefully today. I added a checklist in the PR description above to track these changes. 😊 |
@CISC I merged in your If you would like to test the tool-calls and report any issues, I would also be most grateful. Please see the instructions above in the PR description. With the Console 2: Output:
Communication with MCP server:
|
Wasn't able to get it working on my windows system, it compiled using my win64devkit, but then when I called the --tools using the sse server it just halted with no output. Using it without was just fine. Will debug a bit today after I move to my linux system. |
@brucepro Sounds good please let me know if I can help. At least we know the client is crashing server (they're communicating)! 😅 |
Great! As far as I see this relies on the model outputting OAI compatible JSON responses, right? So models that don't conform (or can't be properly coerced) to that might have issues. There's a (currently paused) PR over at
I'll set it up and run some tests this week. :) |
@brucepro, @CISC There is currently a logical issue with how llama-cli is handling AI response in the templates. I need to update this to use
Have you taken a look at the minja templating? Just want to make sure no duplicate effort is happening. @ochafik might have already completed this logic and it should be working with In general, what I've been doing in this PR is create a new MCP client (currently only supports tool-calls but we can later add resources and prompts); translating toolcalls from OpenAI format to MCP format; and porting over the changes that are happening on |
As inverse templates are not a thing yet minja does not support it, but the hope is certainly that it will be able to once ready. :) In case it wasn't clear we are talking about parsing responses from the model, having an inverse template means we should be able to structurally recreate the chat messages, response (with the emitted tool calls properly denoted) and all from the rendered conversation. |
commit 7adfa18 Author: Mason M <[email protected]> Date: Thu Mar 6 17:19:09 2025 -0400 Re-Prompt after toolcall commit c8843da Author: Mason M <[email protected]> Date: Thu Mar 6 13:41:45 2025 -0400 Use format to extract toolcalls
@CISC, @brucepro I fixed the tool-call formatting/calling now it "works" with various models/templates. For some reason it's looping over-and-over though (keeps calling the function). I'm not exactly sure why, but it could be due to how it's re-prompting with tool output. Main improvement though is that the tool-calls are being invoked properly (for the most part—one model passes strings instead of ints to the "add" function, and it crashes the client because the MCP error response not being handled at the moment). I am tied up for next couple days so won't be able to work on it, but if anyone wants to take a crack at solving the issue a first spot to step in the debugger is the @ochafik /cc |
Sorry to take so long to get back. Issue on windows build box was curl dev lib. On my ubuntu system all worked well according to your instructions. So maybe a reminder to add the curl dev lib |
@ochafik, @brucepro, @CISC, @ngxson, @ggerganov While a CLI certainly provides a good user experience, and in fact we COULD directly inject it into llama-cli's main loop (as I've been working through here), I feel that is actually out of scope for that application. In our case, a new CLI application would instead call "llama-server" for all the LLM sampling requests, and so-forth. And it would pull resources/tools from its connected MCP servers. And those tools could also request sampling from the application, which it would route to llama-server, and so-forth. What I'm trying to say is that I am thinking to instead move this to a repo outside of llama.cpp and simply make it a C library for MCP clients/servers (because really, it should work with llama.cpp and any other providers). So in general, applications will use llama-server, and they will link against the MCP library. The library will provide hooks for the user to accept/reject requests by tools to access sampler, and provide the notication mechanism when prompts change, et cetera. This will provide the intended separation like their TypeScript/Python SDK but in a portable C API. 😊 I am welcome to suggestions if anyone sees another possibility here, but this seems the most straightforward outcome of the proof-of-concept. |
SGTM, but I think it would still be useful for |
We can get there but there will have to be some significant refactors. For example, MCP servers can send sampling requests to the client that would occur in parallel. We will have to support that in llama-cli as well, to dispatch requests locally. So, the capabilities ARE there, but is it in scope to make those changes to llama-cli? |
@brucepro I am going to keep pursuing adding parallel support to get full MCP functionality locally (over time—for now I think all we can do is basic tool-calls). I realize the whole notion of dispatching to llama server between MCP is what you're working on in Python/Typescript. So I don't want to duplicate effort there! But I would like a fully working MCP library in C/C++. Will get there eventually. |
Great. About 80% done in webui. Lots of little things to work out such as getting the resources I to additional context, checking props to make sure supported. Having a subprocess to run prompts. Tools works pretty well although the agent loves the echo function too much. |
This PR adds support for tool-calls using a
--tools
switch to llama-cli.It is currently ⚠Experimental!⚠
To test this, first build llama-cli using something like:
Then run a Model Context Protocol:
In another terminal, launch llama-cli (remove the --single-turn switch to interact):
Output:
And the MCP server output:
Tasks:
Integrating toolcall support with llama-cli
--tools
option to pass in a JSON tools array--tool-choice
option which defaults to "auto" (see this ref)--tool-parallel
switch for parallel tool-calls.oaicompat_completion_params_parse
in utils.hpp intocommon_chat_apply_template
(common.cpp).main.cpp
algorithm?Implement toolcall handlers for Model Context Protocol (MCP).
common_chat_apply_template
via a call tocommon_chat_params_init
, the resulting prompt member ofcommon_chat_params
will contain the JSON-Formatted tool-calls. This should be translated and dispatched to the registered handlers (if one was specified).