-
Notifications
You must be signed in to change notification settings - Fork 325
Kubelet w/ external cloud provider needs run with explicit provider id #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@mattlqx I kinda run into the same issue and it took me ages to figure out my AWS tag on my nodes was wrong. The node has to have a tag that looks like to The cloud controller checks if the returned instance is a correct member there: https://github.com/kubernetes/legacy-cloud-providers/blob/master/aws/aws.go#L4581. I hope it might help you! |
@Erouan50 Thanks. Yeah, that is a requirement also and something I have configured with the legacy tag name |
I ran into this issue as well but only when the instance was not tagged with |
Yeah, instances were tagged with KubernetesCluster from before running the in-tree cloud-controller-manager. Didn't make a change there. |
@mattlqx I think one of the other requirements is that your Kubernetes node name matches the private DNS name of the instance. |
That they do. Which is another unfortunate pain point, but they're all named as |
I agree, I opened an issue for that problem here #63. fwiw the log line:
is expected when the provider ID isn't set. The fallback call to |
That seemed to be the case. Everything worked for me after just specifying the provider id and I haven't messed with it since because it's working. |
/assign @andrewsykim |
Just a note in-passing, the ProviderID used on EKS nodes is I expect that's not relevant, as it might just have been a typo, and I'm also assuming pretty heavily that EKS isn't using a different codebase to this cloud-provider. I thought I'd mention it in case that flags a relevant investigation direction for someone. Edit: It won't matter. Having a quick look at the code, the url |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
The same scenario had happened to me as well, In my case provider id should be passed dynamically since the worker nodes are part of an ASG. This can be achieved only after fetching the AWS metadata and dynamically passing the provider-id. Do we have plans to make aws-cloud-controller-manager make provider id automatically when a new node detected? |
It should set up the provider ID automatically, assuming the The ASG should have the If the above things are correct, and the provider ID is not being automatically detected for you, then that's a bug. This bug, in fact. |
You are correct, Provider Id is configured by the Cloud Controller Manager after the node got bootstrapped by the nodename which matches the private-dns. This means the nodename should match the private dns .Not sure how this is logically connected. |
The reason the EC2 Private DNS name has to match the Node name is because that's the only thing that actually connects a given Node object to a given EC2 instance, before the Provider ID is assigned. The AWS Cloud Provider is not currently set-up for supporting Node names different from the EC2 Private DNS name, see #63 for that feature, hopefully it will be covered early in the AWS Cloud Provider v2 work, see #125. |
gotcha, I think this should be the required one because in case of custom hostname. |
See also this discussion here #131 (comment) Basically, we will be unable to provide any assurance around the node identity if neither instance ID or PrivateDNSName is the node name. Is that still something people will want to do? |
If we restrict this with nodename, definitely it will be a bottleneck or restriction. So the flexibility of having the custom nodename due to certain situations like using VPC-DNS will not be supportive. I think this is a better use-case to refactor the node identification functionality. |
It seems reasonable that if DNS is being used to provide more-custom instance host names, and those names can be made visible to EC2, that they would be usefully consumed as Node names to determine the Provider ID. I think that's less than support for 'arbitrary' names: per the linked discussion, it needs to be something visible to I'm not totally clear what VPC-DNS is in this context, but I'm not personally aware of any method for it to expose different DNS names into I am hoping that the AWS Cloud Provider v2 work can support using the provider ID for everything except automatic provider-id detection (decoupling from private DNS name matching as we see in the in-tree version), which will mean that custom DNS hostnames can be supported using the |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle rotten |
How to get the cluster id ? I'm using self-managed k3s cluster on AWS EC2s with tag "Key: kubernetes.io/cluster/default" & "Value: owned", where default is the cluster name and doesn't work ! |
It looks like the code has moved, the original comment was referring to https://github.com/kubernetes/legacy-cloud-providers/blob/326b090685d0dc77c6295e46ce21ca33fa7ea776/aws/aws.go#L4581-L4584, but as-of k8s 1.20, it's still the same, just moved a little. So the tag you've described on your EC2 instance here looks correct. Since you're running self-managed, what version of cloud-provider-aws are you running? You'd probably have to share the config and logs from it, as there's a whole bunch of ways to get the symptom you're seeing, even if the EC2 tag is correct. |
@TBBle I'm using 0.0.2 helm chart, like below, helm repo add aws-cloud-controller-manager https://kubernetes.github.io/cloud-provider-aws |
Okay, so that will be for kubernetes 1.20, and you're running the AWS v1 cloud provider with log level 2 (by default), so the To diagnose this you'll need logs from the daemonset pods running on your master nodes. Specifically, the first thing to find is the line
and make sure "XXX" here is the clusterID you're using in the tag. You might also want to check exactly what image is being run on those Pods, as the Helm chart tagged in v1.20.0-alpha.0 shows the 1.19.0 image, but I don't know if that was true in the version published to https://kubernetes.github.io/cloud-provider-aws, it (and the README) were possibly updated after tagging. |
Well, using tag "kubernetes.io/cluster/default = owned ", where "default" is the default cluster name of k3s. Getting results below Logs for "AWS cloud filtering on ClusterID":
Logs for aws pod upon startup:
Getting below result with no nodes attached.
I've noticed below in logs:
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This occurred to me & I'm not really sure how to resolve it in my fresh k8s cluster. After reading above I can confirm,
I get the following when I run E0315 13:22:27.190762 1 node_controller.go:277] Error getting instance metadata for node addresses: error fetching node by provider ID: Invalid format for AWS instance (), and error by node name: could not look up instance ID for node "ip-<x>-<x>-<x>-<x>.<region>.compute.internal": node has no providerID
I0315 13:22:27.190830 1 node_controller.go:263] Update 1 nodes status took 158.029µs. I'm not really sure what's causing this. |
I encountered this issue and was able to resolve it by adding the taint |
What happened:
Following the readme to switch to this external cloud provider, it's either not mentioned or I'm misconfigured, nodes do not auto-detect their ProviderID which leads the aws-cloud-controller-manager to not be able to find the instance in AWS. When I manually add the
--provider-id
flag to kubelet in the properaws://$az/$instanceid
format, the aws-cloud-controller-manager is able to find the node.What you expected to happen:
The ProviderID to be automatically detected. I'm not sure this is a correct expectation when using
--cloud-provider=external
.How to reproduce it (as minimally and precisely as possible):
--cloud-provider=external
.--cloud-provider=external
and no--provider-id
.Observe log lines such as https://gist.github.com/mattlqx/a2f6bc5a198fb11f77cbd9adaac46ece
Environment:
kubectl version
):uname -a
): 4.15.0-1056-aws/kind bug
The text was updated successfully, but these errors were encountered: