Replies: 1 comment
-
Hi @khasinski and thank you for the interest in RubyLLM! RubyLLM is designed to be a client for LLMs, not a model host. Adding model serving capabilities would completely change the performance profile - from being IO-bound to CPU/GPU/memory-bound. That's a fundamentally different library with different concerns. I'd recommend keeping your ONNX runtime implementation separate, and if you want, build a provider for RubyLLM that speaks to your server over HTTP. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey, I'm currently using ONNX models that I run with ONNX runtime for faster embeddings. I'm planning on extending it a bit to support token generating models (something with an API similar to ONNX genai). If you add the docs on adding providers I'd be happy to write a wrapper that translates both formats so I could use your DSL for interacting with those models.
Config would probably be just a link to a HF repo and calling those models wouldn't need any HTTP, just regular function calls.
Beta Was this translation helpful? Give feedback.
All reactions