Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Local Model Support via Ollama Integration #2

Open
crmne opened this issue Mar 11, 2025 · 1 comment · May be fixed by #10
Open

Add Local Model Support via Ollama Integration #2

crmne opened this issue Mar 11, 2025 · 1 comment · May be fixed by #10

Comments

@crmne
Copy link
Owner

crmne commented Mar 11, 2025

TL;DR: Add support for running fully local models via Ollama.

Background

While cloud models offer state-of-the-art capabilities, there are compelling cases for running models locally:

  1. Privacy & Compliance: Keep sensitive data entirely on-premise
  2. Cost Control: Eliminate ongoing API costs for high-volume applications
  3. Latency: Remove network overhead for latency-sensitive applications
  4. Offline Operation: Run AI features without internet connectivity

Ollama provides an excellent way to run models like Llama, Mistral, and others locally with a simple API that's compatible with our existing architecture.

Proposed Solution

Add a new provider interface for Ollama that implements our existing abstractions:

# Configuration
RubyLLM.configure do |config|
  config.ollama_host = "http://localhost:11434" # Default
end

# Usage remains identical to cloud models
chat = RubyLLM.chat(model: 'llama2')
chat.ask("What's the capital of France?")

# Or with embeddings
RubyLLM.embed("Ruby is a programmer's best friend", model: 'nomic-embed-text')

Technical Details

For those looking to help implement this, you'll need to:

  1. Create a new provider module in lib/ruby_llm/providers/ollama/
  2. Implement the core provider interface methods:
    • complete - For chat functionality
    • embed - For embeddings
    • api_base - Returns the Ollama API endpoint
    • capabilities - Define model capabilities
  3. Handle the payload formatting differences between Ollama and OpenAI/Claude

The PR should include:

  • Provider implementation
  • Configuration option for Ollama host
  • Tests that can be run against a local Ollama instance
  • Documentation updates

Benefits

  • Cost efficiency: Eliminate API costs for many use cases
  • Privacy: Keep sensitive data local
  • Flexibility: Mix and match local and cloud models in the same codebase
  • Performance: Reduce latency for response-time sensitive applications
@Mizokuiam
Copy link

Hey @crmne, this is a fantastic proposal! Local model support via Ollama would be a huge win for ruby_llm users. The benefits you've outlined – privacy, cost control, latency reduction, and offline capabilities – are all spot-on.

I really like the proposed configuration and usage pattern. Keeping the API consistent with existing cloud models minimizes the learning curve for users. The example snippet is clear and concise:

# Configuration
RubyLLM.configure do |config|
  config.ollama_host = "http://localhost:11434" # Default
end

# Usage remains identical to cloud models
chat = RubyLLM.chat(model: 'llama2', provider: :ollama) # Explicitly set provider
chat.ask("What's the capital of France?")

# Or with embeddings
RubyLLM.embed("Ruby is a programmer's best friend", model: 'nomic-embed-text', provider: :ollama) # Explicitly set provider

One minor suggestion: Adding an explicit provider: :ollama argument to chat and embed calls would provide more clarity and control, especially when using a mixed local/cloud setup. It also future-proofs the codebase for potential naming conflicts if cloud providers ever introduce similarly named models.

Regarding the technical details, your outline is solid. The separation into complete and embed methods makes sense, and handling payload formatting within the provider module is the right approach. Ensuring comprehensive tests against a local Ollama instance is crucial.

One thing to consider during implementation is error handling. Ollama might return different error codes and messages compared to cloud providers. The provider should gracefully handle these differences and translate them into consistent ruby_llm exceptions. We should also consider how to handle cases where the Ollama server is unavailable or returns unexpected responses.

@crmne crmne marked this as a duplicate of #24 Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants