Skip to content

Consul service tokens derived from WI fail binding if service name is not all lowercase #25704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jinnatar opened this issue Apr 17, 2025 · 5 comments

Comments

@jinnatar
Copy link

Nomad version

Nomad v1.9.7
BuildDate 2025-03-11T09:07:15Z
Revision f869597+CHANGES
(attempting to move to v1.10.0 but blocked by this issue)

Operating system and Environment details

Debian 12, native deb install of CE nomad & consul.

Issue

Following the WI for Consul tutorial I can successfully start jobs using the created example service type binding. However, if I change the service name to contain non-lowercase characters the binding does not allow write access to the capitalized service. Ergo, I'm inferring that somewhere along the way either Nomad or Consul implicitly lowercases the ${value.nomad_service} value.

Reproduction steps

  1. Follow the Consul ACL integration tutorial: https://developer.hashicorp.com/nomad/tutorials/integrate-consul/consul-acl
  2. Submit a working job as per the job file listed further down.
  3. Modify the service.name value to Identity-demo-dev. This causes the following plan diff for the Identity block, ServiceName correctly does not get lowercased here:
+/- Name:         "consul-service_identity-demo-dev-http" => "consul-service_Identity-demo-dev-http"
+/- ServiceName:  "identity-demo-dev" => "Identity-demo-dev"
  1. and purge & re-run the job.

Expected Result

Service is registered.

Actual Result

As per Consul logs:

[ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/register from=127.0.0.1:42364 error="Permission denied: token with AccessorID '0bc001cb-7579-a334-c525-50519011f082' lacks permission 'service:write' on \"Identity-demo-dev\"""

If the job was purged before running the changed version the deployment fails due to service registration failing. If a purge is not done and it's an update, the service registration still fails but the task remains "healthy" and all seems fine from Nomad, but it sure ain't actually there in Consul.

Job file (if appropriate)

job "identity_demo" {
  group "httpd_group" {
    count = 1

    network {
      port "http" {}
    }

    service {
      name     = "identity-demo-dev"
      port     = "http"
      identity {
        aud = ["consul.io"]
        ttl = "1h"
      }
    }

    task "httpd_task" {
      driver = "docker"

      config {
        image   = "busybox:1.36"
        command = "httpd"
        args    = ["-f", "-p", "${NOMAD_PORT_http}"]
        ports   = ["http"]
      }

      identity {
        name = "consul_default"
        aud  = ["consul.io"]
        ttl  = "1h"
      }
    }
  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

@jinnatar
Copy link
Author

For the sake of completeness, here's the binding rule:

  AuthMethod:   nomad-workloads
   Description:  
   BindType:     service
   BindName:     ${value.nomad_service}
   Selector:     "nomad_service" in value

@jinnatar
Copy link
Author

jinnatar commented Apr 18, 2025

This appears to be the Consul side issue: hashicorp/consul#20373.

What I'm seeing in summary:

  • As per support capital letters in AclBinding consul#20374 Consul views non-lowercase service names as bad and they want to make that impossible. However, they have not done that, just silently excluded them from working with new features by doing forced lowercasing on bindings but not on service registration.
  • Neither side documents casing restrictions on service name for ACL bindings, but to be fair the service name is documented to be limited to [a-z0-9\-]. However, nothing has enforced or hinted at it until this issue.
  • Both sides happily accept uppercase in service names and correctly register services as written, but seemingly long-term Consul does not want to do that anymore in the name of DNS safety.
  • Nomad has already dropped the token based service registration path.

Ergo, anyone relying on service names supporting capitalization (for example for using them as user-facing text strings) is now unable to upgrade to Nomad 1.10.0 as per best practices with WI service bound Consul ACLs. The workaround is to make a much much more lenient binding that allows write access to any service. This is obviously a bad idea, but probably(?) not any worse than the old approach where the registration was done anyway with wide permissions.

Assuming that what Consul wants is sane and will stick, Nomad should:

  • Reject a non-lowercased service name in job planning. While the docs do say only valid chars are: [a-z0-9\-], at least Nomad 1.9.7 does not reject service names outside of that and even worse, everything "just works".
  • If not rejecting them outright because other integrations may allow for them, they could be auto-lowercased coupled with an emitted warning that Consul no longer supports cased service names.
  • Or if neither of those are deemed acceptable, at least amend the tutorial (https://developer.hashicorp.com/nomad/tutorials/integrate-consul/consul-acl) to contain severe warnings that this will not work for non-lowercase services.

@tgross
Copy link
Member

tgross commented Apr 18, 2025

Thanks for hunting that down @jinnatar. I suspect this isn't going to be able to be changed on the Consul side anytime soon.

Reject a non-lowercased service name in job planning. While the docs do say only valid chars are: [a-z0-9-], at least Nomad 1.9.7 does not reject service names outside of that and even worse, everything "just works".

Validation not matching the docs is definitely bad behavior on Nomad's part. From the perspective of backwards compatibility with existing services, this is going to be a little painful to resolve -- we need to make sure we're not causing outages during upgrades. I'll mark this for roadmapping.

@jinnatar
Copy link
Author

jinnatar commented Apr 18, 2025

Yeah, all paths out of this seem painful. Off the top of my head:

  1. Rejecting in planning isn't publicly service affecting, but forces a change of practices without prior warning, which will cause grumbles.
  2. Lowercasing for compatibility with warnings may end up being public service affecting, perhaps not as a crash but as a visible change of labels if the service name is used to preserve a human readable label (say OpenLDAP, impossible to transform to from openldap so a label needs to be used to preserve a human readable name instead.)
  3. Emitting a warning in planning might be seen by folks that do it interactively, but who knows what business case would never see them unless that's already an established pattern.
  4. Log warnings are unlikely to be seen by many, "why go look at logs when everything works"

I'd say an escalating ratchet of a quickly implemented planning WARN moving over to an eventual planning reject would give the most heads up but actually close the issue at some point.

@tgross
Copy link
Member

tgross commented Apr 18, 2025

Agreed that logs are not the right approach here. Fortunately we have a mechanism in place already to emit warnings on job submission (which includes job plan), although it's not used much today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

2 participants