Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate shareable workspaces #1067

Open
lukehinds opened this issue Feb 15, 2025 · 10 comments
Open

Investigate shareable workspaces #1067

lukehinds opened this issue Feb 15, 2025 · 10 comments
Assignees

Comments

@lukehinds
Copy link
Contributor

Investigate a simple shareable workspace configuration, where a user could export a workspace, capturing prompts , provider config and muxing rules. This could be saved as file which someone else could then import.

@JAORMX
Copy link
Contributor

JAORMX commented Feb 15, 2025

I could look at this. Provider config is not workspace-scoped, but I'm sure we could make it work

@peppescg
Copy link
Contributor

I think this will impact the UI dashboard too. Cause I would expect to have a button in the dashboard and import/export the config ( keep in mind that muxing is actually available only in UI).

@JAORMX
Copy link
Contributor

JAORMX commented Feb 17, 2025

We agreed someone else will work on this. I can tag along and help out if needed.

@JAORMX
Copy link
Contributor

JAORMX commented Feb 17, 2025

I'll pair up with @alex-mcgovern

@alex-mcgovern
Copy link
Contributor

Actually picking this up now

@JAORMX
Copy link
Contributor

JAORMX commented Feb 19, 2025

#1086 added the basic configuration. It's hooked to the workspace creation REST API endpoint, but currently unused. I can see us parsing a JSON file with that content too as an alternative.

@alex-mcgovern
Copy link
Contributor

alex-mcgovern commented Feb 19, 2025

I think a north star for this should be a local JSON "CodeGate config", this would allow:

  • hand editing in editor-of-choice
  • checking in to version control (either per-repo or a .dotfiles repo)
  • sharing/distributing within a team

I'm imagining this could be stored in a number of places, e.g. $HOME/.config/codegate, in the currently open repository, etc. but that decision seems further down the road.

After speaking with Ozz, I think the best small increment to deliver initially is a small change to the REST API to update/read config. Am going to try and think about the API design such that we don't have to rework it many times towards the desired behaviour.

A future step might be to use a Docker bind mounts to enable local JSON config idea, in addition to editing it via the CRUD API.

What I'm imagining a local JSON config could look like in future.

{
    // ... other global config
    "workspaces": {
        "default": {
            "system_prompt": "",
            "muxing_rules": [
                {
                    "provider_name": "provider1",
                    "provider_id": "provider1_id",
                    "model": "model1",
                    "matcher_type": "catch_all",
                    "matcher": null
                },
                {
                    "provider_name": "provider2",
                    "provider_id": "provider2_id",
                    "model": "model2",
                    "matcher_type": "filename_match",
                    "matcher": "*.py"
                }
            ]
            // ... other workspace config
        },
        "foo": {
            "system_prompt": "",
            "muxing_rules": [
                {
                    "provider_name": "provider3",
                    "provider_id": "provider3_id",
                    "model": "model3",
                    "matcher_type": "request_type_match",
                    "matcher": "chat"
                }
            ]
            // ... other workspace config
        }
    }
}

@alex-mcgovern
Copy link
Contributor

Update for posterity, the initial work on this is progressing, it took some time to figure out a good approach for integration testing, but that is in place now too. Will hammer out a few more tests cases and add the GET endpoint for shareable config, then it will be ready.

@alex-mcgovern
Copy link
Contributor

I have made some progress with endpoints to create/update/read a workspace, although the actual data that we are exporting/uploading contains UUIDs that are specific to a user's instance — so this isn't viable.

Specifically the instance-specific data is in the mux rules, which specify providers by provider_id.

curl -s "http://localhost:8989/api/v1/workspaces/default/muxes" | jq

[
  {
    "provider_name": null,
    "provider_id": "945a33df-74d9-4aa8-8f12-43a44f515d23", 
    "model": "x-ai/grok-beta",
    "matcher_type": "filename_match",
    "matcher": "*.py"
  },
  {
    "provider_name": null,
    "provider_id": "0653607e-bf47-42d8-9a9d-86cdca3cf19f",
    "model": "deepseek-r1:1.5b",
    "matcher_type": "catch_all",
    "matcher": ""
  }
]
curl -X PUT http://localhost:8989/api/v1/workspaces/default/muxes \
  -H "Content-Type: application/json" \
  -d '[
        {
          "provider_name": null,
          "provider_id": "945a33df-74d9-4aa8-8f12-43a44f515d23",
          "model": "anthropic/claude-3-opus:beta",
          "matcher_type": "filename_match",
          "matcher": "*.py"
        },
        {
          "provider_name": null,
          "provider_id": "0653607e-bf47-42d8-9a9d-86cdca3cf19f",
          "model": "deepseek-r1:1.5b",
          "matcher_type": "catch_all",
          "matcher": ""
        }
     ]'

Idea 1: refactor muxing to remove the instance specific provider_id from a rule

This seems like a lot of work upfront, and may be closing some doors to future
requirements that I'm not aware of yet.

However, I think it would also simplify things for this use case, and for
file-based configuration.

As part of this idea, I think this would be wise to enforce a unique constraint on provider_type in the
provider_endpoints table. There is probably a tricky, opinionated migration
involved in this that may discard some user configuration, although given we're in the very early stage of product
development, I think this is acceptable.

All told, this would allow creating/updating a workspace to look like this

curl -X POST http://localhost:8989/api/v1/workspaces \
  -H "Content-Type: application/json" \
  -d '{
          "name": "my-workspace",
          "config": {
                      "custom_instructions": "Respond in prose",
                      "muxing_rules": [
                          {
-                             "provider_name": null,
-                             "provider_id": "945a33df-74d9-4aa8-8f12-43a44f515d23",
+                             "provider_type": "openrouter",
                              "matcher": "*.js",
                              "matcher_type": "filename_match",
                          }
                      ],
                  },
      }'
  • and then exporting a workspace would produce a compatible JSON blob like
    this
curl -s "http://localhost:8989/api/v1/workspaces/default"
{
  "name": "default",
  "config": {
      "custom_instructions": "foo",
      "muxing_rules": [
        {
-           "provider_name": null,
-           "provider_id": "945a33df-74d9-4aa8-8f12-43a44f515d23",
+           "provider_type": "openrouter",
            "model": "x-ai/grok-beta",
            "matcher_type": "filename_match",
            "matcher": "*.py"
        },
        {
-           "provider_name": null,
-           "provider_id": "0653607e-bf47-42d8-9a9d-86cdca3cf19f",
+           "provider_type": "ollama",
            "model": "deepseek-r1:1.5b",
            "matcher_type": "catch_all",
            "matcher": ""
        }
      ]
  }
}

Idea 2: Fallback logic for finding provider by ID / name / type

There is another approach, which is simply to make the provider_id and
provider_type optional in a MuxRule, and update all CRUD operations to use
fallback logic to attempt to find a provider by type, or name, if the ID is not
present, although my concern is that this might not produce the most
deterministic results if there are multiple providers with the same type.

An example implementation might look like this:

def get_provider(provider_id=None, provider_type=None):
    if provider_id:
        provider = find_provider_by_id(provider_id)
        if provider:
            return provider
    if provider_type:
        providers = find_providers_by_type(provider_type)
        if len(providers) == 1:
            return providers[0]
        elif len(providers) > 1:
            raise Exception("Multiple providers found for type: {}".format(provider_type))
    raise Exception("Provider not found")

# Example usage in a CRUD operation
def create_mux_rule(data):
    provider = get_provider(provider_id=data.get('provider_id'), provider_type=data.get('provider_type'))
    if not provider:
        raise Exception("Provider not found")
    # Continue with creating the mux rule using the provider

@JAORMX
Copy link
Contributor

JAORMX commented Mar 6, 2025

For the record, we had an out of band discussion about this and went for option two; with the caveat that the parameters will only be checked on create/update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants