Skip to content

Kubelet w/ external cloud provider needs run with explicit provider id #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mattlqx opened this issue Jan 9, 2020 · 35 comments
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@mattlqx
Copy link
Contributor

mattlqx commented Jan 9, 2020

What happened:

Following the readme to switch to this external cloud provider, it's either not mentioned or I'm misconfigured, nodes do not auto-detect their ProviderID which leads the aws-cloud-controller-manager to not be able to find the instance in AWS. When I manually add the --provider-id flag to kubelet in the proper aws://$az/$instanceid format, the aws-cloud-controller-manager is able to find the node.

What you expected to happen:

The ProviderID to be automatically detected. I'm not sure this is a correct expectation when using --cloud-provider=external.

How to reproduce it (as minimally and precisely as possible):

  1. Start aws-cloud-controller-manager and set all daemons to --cloud-provider=external.
  2. Start Kubelet with --cloud-provider=external and no --provider-id.

Observe log lines such as https://gist.github.com/mattlqx/a2f6bc5a198fb11f77cbd9adaac46ece

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.4", GitCommit:"224be7bdce5a9dd0c2fd0d46b83865648e2fe0ba", GitTreeState:"clean", BuildDate:"2019-12-11T12:47:40Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.4", GitCommit:"224be7bdce5a9dd0c2fd0d46b83865648e2fe0ba", GitTreeState:"clean", BuildDate:"2019-12-11T12:37:43Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.3 LTS
  • Kernel (e.g. uname -a): 4.15.0-1056-aws
  • Install tools:
  • Others:

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 9, 2020
@Erouan50
Copy link

Erouan50 commented Jan 28, 2020

@mattlqx I kinda run into the same issue and it took me ages to figure out my AWS tag on my nodes was wrong. The node has to have a tag that looks like to kubernetes.io/cluster/<cluster_id>: whatever or the instance is not included. The cluster_id should be the name of your cluster.

The cloud controller checks if the returned instance is a correct member there: https://github.com/kubernetes/legacy-cloud-providers/blob/master/aws/aws.go#L4581.

I hope it might help you!

@mattlqx
Copy link
Contributor Author

mattlqx commented Feb 20, 2020

@Erouan50 Thanks. Yeah, that is a requirement also and something I have configured with the legacy tag name KubernetesCluster. https://github.com/kubernetes/legacy-cloud-providers/blob/2dec4f8f5a22cc425713b33496c5fc5892dd4573/aws/tags.go#L41

@andrewsykim
Copy link
Member

I ran into this issue as well but only when the instance was not tagged with KubernetesCluster. @mattlqx was this the case for you even when the instance had the cluster tag?

@mattlqx
Copy link
Contributor Author

mattlqx commented Apr 30, 2020

Yeah, instances were tagged with KubernetesCluster from before running the in-tree cloud-controller-manager. Didn't make a change there.

@andrewsykim
Copy link
Member

@mattlqx I think one of the other requirements is that your Kubernetes node name matches the private DNS name of the instance.

@mattlqx
Copy link
Contributor Author

mattlqx commented May 1, 2020

That they do. Which is another unfortunate pain point, but they're all named as ip-172-30-12-164.us-west-2.compute.internal, etc.

@andrewsykim
Copy link
Member

I agree, I opened an issue for that problem here #63.

fwiw the log line:

aws.go:4593] Unable to convert node name "ip-172-30-0-123.us-west-2.compute.internal" to aws instanceID, fall back to findInstanceByNodeName: node has no providerID

is expected when the provider ID isn't set. The fallback call to findInstanceByNodeName should still pass in this case though. So the node never gets registered until you pass --provider-id to kubelet?

@mattlqx
Copy link
Contributor Author

mattlqx commented May 1, 2020

That seemed to be the case. Everything worked for me after just specifying the provider id and I haven't messed with it since because it's working.

@nckturner
Copy link
Contributor

/assign @andrewsykim

@TBBle
Copy link

TBBle commented Jun 15, 2020

Just a note in-passing, the ProviderID used on EKS nodes is aws:///$az/$instanceid, i.e. one more slash than aws://$az/$instanceid in the issue report.

I expect that's not relevant, as it might just have been a typo, and I'm also assuming pretty heavily that EKS isn't using a different codebase to this cloud-provider. I thought I'd mention it in case that flags a relevant investigation direction for someone.

Edit: It won't matter. Having a quick look at the code, the url aws://$az/$instanceid will work by accident. The host ($az) is never checked, and it takes the bare instance ID as if it was aws:///$instanceid, which is valid since it it only needs to extract the instance ID anyway.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 13, 2020
@jeswinkninan
Copy link
Contributor

The same scenario had happened to me as well, In my case provider id should be passed dynamically since the worker nodes are part of an ASG. This can be achieved only after fetching the AWS metadata and dynamically passing the provider-id. Do we have plans to make aws-cloud-controller-manager make provider id automatically when a new node detected?

@TBBle
Copy link

TBBle commented Oct 6, 2020

It should set up the provider ID automatically, assuming the kubernetes.io/cluster/<cluster_id>: whatever tag (or legacy KubernetesCluster tag) is present on the EC2 instance, and the private DNS name in EC2 matches the nodename passed to kubelet.

The ASG should have the kubernetes.io/cluster/<cluster_id>: whatever tag and be propagating it to its EC2 instances, and the private DNS name is up to the script that runs kubelet to get right.

If the above things are correct, and the provider ID is not being automatically detected for you, then that's a bug.

This bug, in fact.

@jeswinkninan
Copy link
Contributor

jeswinkninan commented Oct 6, 2020

You are correct, Provider Id is configured by the Cloud Controller Manager after the node got bootstrapped by the nodename which matches the private-dns. This means the nodename should match the private dns .Not sure how this is logically connected.

@TBBle
Copy link

TBBle commented Oct 7, 2020

The reason the EC2 Private DNS name has to match the Node name is because that's the only thing that actually connects a given Node object to a given EC2 instance, before the Provider ID is assigned.

The AWS Cloud Provider is not currently set-up for supporting Node names different from the EC2 Private DNS name, see #63 for that feature, hopefully it will be covered early in the AWS Cloud Provider v2 work, see #125.

@jeswinkninan
Copy link
Contributor

gotcha, I think this should be the required one because in case of custom hostname.

@randomvariable
Copy link
Member

See also this discussion here #131 (comment)

Basically, we will be unable to provide any assurance around the node identity if neither instance ID or PrivateDNSName is the node name. Is that still something people will want to do?

@jeswinkninan
Copy link
Contributor

jeswinkninan commented Oct 7, 2020

If we restrict this with nodename, definitely it will be a bottleneck or restriction. So the flexibility of having the custom nodename due to certain situations like using VPC-DNS will not be supportive. I think this is a better use-case to refactor the node identification functionality.

@TBBle
Copy link

TBBle commented Oct 8, 2020

It seems reasonable that if DNS is being used to provide more-custom instance host names, and those names can be made visible to EC2, that they would be usefully consumed as Node names to determine the Provider ID.

I think that's less than support for 'arbitrary' names: per the linked discussion, it needs to be something visible to ec2:DescribeInstances or there's no secure way to tie the kubelet back to an EC2 instance.

I'm not totally clear what VPC-DNS is in this context, but I'm not personally aware of any method for it to expose different DNS names into ec2:DescribeInstances.

I am hoping that the AWS Cloud Provider v2 work can support using the provider ID for everything except automatic provider-id detection (decoupling from private DNS name matching as we see in the in-tree version), which will mean that custom DNS hostnames can be supported using the --provider-id and --hostname-override parameters to Kubelet. --provider-id would be secure, since it contains the instance-id, which can be matched to ec2:DescribeInstances as is proposed for the next step of #131. This would resolve kubernetes/kubernetes#54482.

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 7, 2020
@frittentheke
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 25, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 23, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 25, 2021
@ayberk
Copy link
Contributor

ayberk commented Mar 25, 2021

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 25, 2021
@nurhanhasan
Copy link

nurhanhasan commented May 31, 2021

@mattlqx I kinda run into the same issue and it took me ages to figure out my AWS tag on my nodes was wrong. The node has to have a tag that looks like to kubernetes.io/cluster/<cluster_id>: whatever or the instance is not included. The cluster_id should be the name of your cluster.

The cloud controller checks if the returned instance is a correct member there: https://github.com/kubernetes/legacy-cloud-providers/blob/master/aws/aws.go#L4581.

I hope it might help you!

How to get the cluster id ?

I'm using self-managed k3s cluster on AWS EC2s with tag "Key: kubernetes.io/cluster/default" & "Value: owned", where default is the cluster name and doesn't work !

@TBBle
Copy link

TBBle commented May 31, 2021

It looks like the code has moved, the original comment was referring to https://github.com/kubernetes/legacy-cloud-providers/blob/326b090685d0dc77c6295e46ce21ca33fa7ea776/aws/aws.go#L4581-L4584, but as-of k8s 1.20, it's still the same, just moved a little.

So the tag you've described on your EC2 instance here looks correct.

Since you're running self-managed, what version of cloud-provider-aws are you running? You'd probably have to share the config and logs from it, as there's a whole bunch of ways to get the symptom you're seeing, even if the EC2 tag is correct.

@nurhanhasan
Copy link

@TBBle I'm using 0.0.2 helm chart, like below,

helm repo add aws-cloud-controller-manager https://kubernetes.github.io/cloud-provider-aws
helm repo update
helm install aws-cloud-controller-manager aws-cloud-controller-manager/aws-cloud-controller-manager --version 0.0.2

@TBBle
Copy link

TBBle commented Jun 2, 2021

Okay, so that will be for kubernetes 1.20, and you're running the AWS v1 cloud provider with log level 2 (by default), so the describeInstances function is still unchanged.

To diagnose this you'll need logs from the daemonset pods running on your master nodes. Specifically, the first thing to find is the line

AWS cloud filtering on ClusterID: XXX

and make sure "XXX" here is the clusterID you're using in the tag.

You might also want to check exactly what image is being run on those Pods, as the Helm chart tagged in v1.20.0-alpha.0 shows the 1.19.0 image, but I don't know if that was true in the version published to https://kubernetes.github.io/cloud-provider-aws, it (and the README) were possibly updated after tagging.

@nurhanhasan
Copy link

Well, using tag "kubernetes.io/cluster/default = owned ", where "default" is the default cluster name of k3s.

Getting results below

Logs for "AWS cloud filtering on ClusterID":

ubuntu@ip-10-0-2-137:~$ kubectl logs aws-cloud-controller-manager-q68vw -n kube-system | grep "AWS cloud filtering on ClusterID:"
I0623 21:53:50.266641       1 tags.go:79] AWS cloud filtering on ClusterID: default

Logs for aws pod upon startup:

ubuntu@ip-10-0-2-137:~$ kubectl logs aws-cloud-controller-manager-q68vw -n kube-system | grep "failed"
I0623 22:53:59.061989       1 flags.go:59] FLAG: --add-dir-header="false"
I0623 22:53:59.062080       1 flags.go:59] FLAG: --address="0.0.0.0"
I0623 22:53:59.062089       1 flags.go:59] FLAG: --allocate-node-cidrs="false"
I0623 22:53:59.062097       1 flags.go:59] FLAG: --allow-untagged-cloud="false"
I0623 22:53:59.062102       1 flags.go:59] FLAG: --alsologtostderr="false"
I0623 22:53:59.062107       1 flags.go:59] FLAG: --authentication-kubeconfig=""
I0623 22:53:59.062114       1 flags.go:59] FLAG: --authentication-skip-lookup="false"
I0623 22:53:59.062119       1 flags.go:59] FLAG: --authentication-token-webhook-cache-ttl="10s"
I0623 22:53:59.062126       1 flags.go:59] FLAG: --authentication-tolerate-lookup-failure="false"
I0623 22:53:59.062131       1 flags.go:59] FLAG: --authorization-always-allow-paths="[/healthz]"
I0623 22:53:59.062145       1 flags.go:59] FLAG: --authorization-kubeconfig=""
I0623 22:53:59.062150       1 flags.go:59] FLAG: --authorization-webhook-cache-authorized-ttl="10s"
I0623 22:53:59.062155       1 flags.go:59] FLAG: --authorization-webhook-cache-unauthorized-ttl="10s"
I0623 22:53:59.062160       1 flags.go:59] FLAG: --bind-address="0.0.0.0"
I0623 22:53:59.062165       1 flags.go:59] FLAG: --cert-dir=""
I0623 22:53:59.062206       1 flags.go:59] FLAG: --cidr-allocator-type="RangeAllocator"
I0623 22:53:59.062213       1 flags.go:59] FLAG: --client-ca-file=""
I0623 22:53:59.062217       1 flags.go:59] FLAG: --cloud-config=""
I0623 22:53:59.062222       1 flags.go:59] FLAG: --cloud-provider="aws"
I0623 22:53:59.062227       1 flags.go:59] FLAG: --cluster-cidr=""
I0623 22:53:59.062231       1 flags.go:59] FLAG: --cluster-name="kubernetes"
I0623 22:53:59.062236       1 flags.go:59] FLAG: --concurrent-service-syncs="1"
I0623 22:53:59.062243       1 flags.go:59] FLAG: --configure-cloud-routes="true"
I0623 22:53:59.062247       1 flags.go:59] FLAG: --contention-profiling="false"
I0623 22:53:59.062252       1 flags.go:59] FLAG: --controller-start-interval="0s"
I0623 22:53:59.062257       1 flags.go:59] FLAG: --controllers="[*]"
I0623 22:53:59.062264       1 flags.go:59] FLAG: --external-cloud-volume-plugin=""
I0623 22:53:59.062269       1 flags.go:59] FLAG: --feature-gates=""
I0623 22:53:59.062277       1 flags.go:59] FLAG: --help="false"
I0623 22:53:59.062282       1 flags.go:59] FLAG: --http2-max-streams-per-connection="0"
I0623 22:53:59.062289       1 flags.go:59] FLAG: --kube-api-burst="30"
I0623 22:53:59.062294       1 flags.go:59] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0623 22:53:59.062300       1 flags.go:59] FLAG: --kube-api-qps="20"
I0623 22:53:59.062307       1 flags.go:59] FLAG: --kubeconfig=""
I0623 22:53:59.062312       1 flags.go:59] FLAG: --leader-elect="true"
I0623 22:53:59.062317       1 flags.go:59] FLAG: --leader-elect-lease-duration="15s"
I0623 22:53:59.062322       1 flags.go:59] FLAG: --leader-elect-renew-deadline="10s"
I0623 22:53:59.062326       1 flags.go:59] FLAG: --leader-elect-resource-lock="leases"
I0623 22:53:59.062331       1 flags.go:59] FLAG: --leader-elect-resource-name="cloud-controller-manager"
I0623 22:53:59.062336       1 flags.go:59] FLAG: --leader-elect-resource-namespace="kube-system"
I0623 22:53:59.062344       1 flags.go:59] FLAG: --leader-elect-retry-period="2s"
I0623 22:53:59.062349       1 flags.go:59] FLAG: --log-backtrace-at=":0"
I0623 22:53:59.062358       1 flags.go:59] FLAG: --log-dir=""
I0623 22:53:59.062364       1 flags.go:59] FLAG: --log-file=""
I0623 22:53:59.062368       1 flags.go:59] FLAG: --log-file-max-size="1800"
I0623 22:53:59.062375       1 flags.go:59] FLAG: --log-flush-frequency="5s"
I0623 22:53:59.062379       1 flags.go:59] FLAG: --logtostderr="true"
I0623 22:53:59.062384       1 flags.go:59] FLAG: --master=""
I0623 22:53:59.062394       1 flags.go:59] FLAG: --min-resync-period="12h0m0s"
I0623 22:53:59.062400       1 flags.go:59] FLAG: --node-monitor-period="5s"
I0623 22:53:59.062405       1 flags.go:59] FLAG: --node-status-update-frequency="5m0s"
I0623 22:53:59.062410       1 flags.go:59] FLAG: --node-sync-period="0s"
I0623 22:53:59.062414       1 flags.go:59] FLAG: --one-output="false"
I0623 22:53:59.062419       1 flags.go:59] FLAG: --permit-port-sharing="false"
I0623 22:53:59.062425       1 flags.go:59] FLAG: --port="0"
I0623 22:53:59.062430       1 flags.go:59] FLAG: --profiling="true"
I0623 22:53:59.062434       1 flags.go:59] FLAG: --requestheader-allowed-names="[]"
I0623 22:53:59.062460       1 flags.go:59] FLAG: --requestheader-client-ca-file=""
I0623 22:53:59.062465       1 flags.go:59] FLAG: --requestheader-extra-headers-prefix="[x-remote-extra-]"
I0623 22:53:59.062472       1 flags.go:59] FLAG: --requestheader-group-headers="[x-remote-group]"
I0623 22:53:59.062481       1 flags.go:59] FLAG: --requestheader-username-headers="[x-remote-user]"
I0623 22:53:59.062488       1 flags.go:59] FLAG: --route-reconciliation-period="10s"
I0623 22:53:59.062493       1 flags.go:59] FLAG: --secure-port="10258"
I0623 22:53:59.062498       1 flags.go:59] FLAG: --skip-headers="false"
I0623 22:53:59.062504       1 flags.go:59] FLAG: --skip-log-headers="false"
I0623 22:53:59.062509       1 flags.go:59] FLAG: --stderrthreshold="2"
I0623 22:53:59.062513       1 flags.go:59] FLAG: --tls-cert-file=""
I0623 22:53:59.062518       1 flags.go:59] FLAG: --tls-cipher-suites="[]"
I0623 22:53:59.062534       1 flags.go:59] FLAG: --tls-min-version=""
I0623 22:53:59.062540       1 flags.go:59] FLAG: --tls-private-key-file=""
I0623 22:53:59.062545       1 flags.go:59] FLAG: --tls-sni-cert-key="[]"
I0623 22:53:59.062558       1 flags.go:59] FLAG: --use-service-account-credentials="false"
I0623 22:53:59.062563       1 flags.go:59] FLAG: --v="2"
I0623 22:53:59.062606       1 flags.go:59] FLAG: --version="false"
I0623 22:53:59.062614       1 flags.go:59] FLAG: --vmodule=""
I0623 22:53:59.446301       1 serving.go:331] Generated self-signed cert in-memory
I0623 22:54:00.376252       1 requestheader_controller.go:244] Loaded a new request header values for RequestHeaderAuthRequestController
W0623 22:54:00.377601       1 client_config.go:614] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0623 22:54:00.380567       1 aws.go:1251] Building AWS cloudprovider
I0623 22:54:00.380661       1 aws.go:1211] Zone not specified in configuration file; querying AWS metadata service
I0623 22:54:06.914287       1 tags.go:79] AWS cloud filtering on ClusterID: default
I0623 22:54:06.915744       1 aws.go:802] Setting up informers for Cloud
I0623 22:54:06.915813       1 controllermanager.go:127] Version: v0.0.0-master+$Format:%h$
I0623 22:54:06.954334       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0623 22:54:06.954355       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0623 22:54:06.954445       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0623 22:54:06.954454       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0623 22:54:06.954497       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0623 22:54:06.954503       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0623 22:54:06.954670       1 reflector.go:219] Starting reflector *v1.ConfigMap (12h0m0s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0623 22:54:06.955392       1 reflector.go:219] Starting reflector *v1.ConfigMap (12h0m0s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0623 22:54:06.955775       1 reflector.go:219] Starting reflector *v1.ConfigMap (12h0m0s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0623 22:54:06.956689       1 tlsconfig.go:200] loaded serving cert ["Generated self signed cert"]: "localhost@1624488839" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer="localhost-ca@1624488839" (2021-06-23 21:53:59 +0000 UTC to 2022-06-23 21:53:59 +0000 UTC (now=2021-06-23 22:54:06.956661543 +0000 UTC))
I0623 22:54:06.957125       1 named_certificates.go:53] loaded SNI cert [0/"self-signed loopback"]: "apiserver-loopback-client@1624488840" [serving] validServingFor=[apiserver-loopback-client] issuer="apiserver-loopback-client-ca@1624488839" (2021-06-23 21:53:59 +0000 UTC to 2022-06-23 21:53:59 +0000 UTC (now=2021-06-23 22:54:06.957109074 +0000 UTC))
I0623 22:54:06.957178       1 secure_serving.go:197] Serving securely on [::]:10258
I0623 22:54:06.957212       1 leaderelection.go:243] attempting to acquire leader lease kube-system/cloud-controller-manager...
I0623 22:54:06.957258       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0623 22:54:07.017418       1 leaderelection.go:253] successfully acquired lease kube-system/cloud-controller-manager
I0623 22:54:07.017973       1 event.go:291] "Event occurred" object="kube-system/cloud-controller-manager" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="aws-cloud-controller-manager-ffzb6_f067a183-6f78-496b-9d80-ab65efe2e8ac became leader"
I0623 22:54:07.054517       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 
I0623 22:54:07.054588       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 
I0623 22:54:07.054650       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController 
I0623 22:54:07.055374       1 tlsconfig.go:178] loaded client CA [0/"client-ca::kube-system::extension-apiserver-authentication::client-ca-file,client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"]: "k3s-client-ca@1624488783" [] issuer="<self>" (2021-06-23 22:53:03 +0000 UTC to 2031-06-21 22:53:03 +0000 UTC (now=2021-06-23 22:54:07.055318292 +0000 UTC))
I0623 22:54:07.055916       1 tlsconfig.go:200] loaded serving cert ["Generated self signed cert"]: "localhost@1624488839" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer="localhost-ca@1624488839" (2021-06-23 21:53:59 +0000 UTC to 2022-06-23 21:53:59 +0000 UTC (now=2021-06-23 22:54:07.05589157 +0000 UTC))
I0623 22:54:07.056712       1 named_certificates.go:53] loaded SNI cert [0/"self-signed loopback"]: "apiserver-loopback-client@1624488840" [serving] validServingFor=[apiserver-loopback-client] issuer="apiserver-loopback-client-ca@1624488839" (2021-06-23 21:53:59 +0000 UTC to 2022-06-23 21:53:59 +0000 UTC (now=2021-06-23 22:54:07.056576804 +0000 UTC))
I0623 22:54:07.057072       1 tlsconfig.go:178] loaded client CA [0/"client-ca::kube-system::extension-apiserver-authentication::client-ca-file,client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"]: "k3s-client-ca@1624488783" [] issuer="<self>" (2021-06-23 22:53:03 +0000 UTC to 2031-06-21 22:53:03 +0000 UTC (now=2021-06-23 22:54:07.057029077 +0000 UTC))
I0623 22:54:07.057108       1 tlsconfig.go:178] loaded client CA [1/"client-ca::kube-system::extension-apiserver-authentication::client-ca-file,client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"]: "k3s-request-header-ca@1624488783" [] issuer="<self>" (2021-06-23 22:53:03 +0000 UTC to 2031-06-21 22:53:03 +0000 UTC (now=2021-06-23 22:54:07.057090484 +0000 UTC))
I0623 22:54:07.057573       1 tlsconfig.go:200] loaded serving cert ["Generated self signed cert"]: "localhost@1624488839" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer="localhost-ca@1624488839" (2021-06-23 21:53:59 +0000 UTC to 2022-06-23 21:53:59 +0000 UTC (now=2021-06-23 22:54:07.057557427 +0000 UTC))
I0623 22:54:07.058054       1 named_certificates.go:53] loaded SNI cert [0/"self-signed loopback"]: "apiserver-loopback-client@1624488840" [serving] validServingFor=[apiserver-loopback-client] issuer="apiserver-loopback-client-ca@1624488839" (2021-06-23 21:53:59 +0000 UTC to 2022-06-23 21:53:59 +0000 UTC (now=2021-06-23 22:54:07.058013019 +0000 UTC))
I0623 22:54:07.816349       1 controllermanager.go:228] Starting "cloud-node"
I0623 22:54:07.817320       1 node_controller.go:115] Sending events to api server.
I0623 22:54:07.817405       1 controllermanager.go:238] Started "cloud-node"
I0623 22:54:07.817415       1 controllermanager.go:228] Starting "cloud-node-lifecycle"
I0623 22:54:07.818315       1 node_lifecycle_controller.go:77] Sending events to api server
I0623 22:54:07.818349       1 controllermanager.go:238] Started "cloud-node-lifecycle"
I0623 22:54:07.818358       1 controllermanager.go:228] Starting "service"
I0623 22:54:07.819490       1 controllermanager.go:238] Started "service"
I0623 22:54:07.819502       1 controllermanager.go:228] Starting "route"
I0623 22:54:07.819508       1 core.go:108] Will not configure cloud provider routes for allocate-node-cidrs: false, configure-cloud-routes: true.
W0623 22:54:07.819515       1 controllermanager.go:235] Skipping "route"
I0623 22:54:07.819822       1 node_controller.go:154] Waiting for informer caches to sync
I0623 22:54:07.819910       1 controller.go:239] Starting service controller
I0623 22:54:07.819919       1 shared_informer.go:240] Waiting for caches to sync for service
I0623 22:54:07.831378       1 reflector.go:219] Starting reflector *v1.Service (30s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0623 22:54:07.831859       1 reflector.go:219] Starting reflector *v1.Node (12h46m28.88585414s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0623 22:54:07.835105       1 controller.go:708] Detected change in list of current cluster nodes. New node set: map[ip-10-0-8-166.ec2.internal:{}]
I0623 22:54:07.835183       1 controller.go:716] Successfully updated 0 out of 0 load balancers to direct traffic to the updated set of nodes
I0623 22:54:07.919963       1 shared_informer.go:247] Caches are synced for service 
I0623 22:54:07.920161       1 controller.go:368] Ensuring load balancer for service ingress-nginx/my-ingress-ingress-nginx-controller
I0623 22:54:07.920238       1 controller.go:853] Adding finalizer to service ingress-nginx/my-ingress-ingress-nginx-controller
I0623 22:54:07.922990       1 event.go:291] "Event occurred" object="ingress-nginx/my-ingress-ingress-nginx-controller" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I0623 22:54:07.974991       1 aws.go:3788] EnsureLoadBalancer(kubernetes, ingress-nginx, my-ingress-ingress-nginx-controller, us-east-1, , [{http TCP <nil> 80 {1 0 http} 32709} {https TCP <nil> 443 {1 0 https} 31452}], map[meta.helm.sh/release-name:my-ingress meta.helm.sh/release-namespace:ingress-nginx])
W0623 22:54:07.975072       1 instances.go:115] node "ip-10-0-8-166.ec2.internal" did not have ProviderID set
I0623 22:54:08.458003       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0fdb3f166ec64326c"
I0623 22:54:08.458056       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-02c064f4e42731165"
I0623 22:54:08.458068       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0f093fc357613a71a"
I0623 22:54:08.458077       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0b14dc8ed2ff25e73"
I0623 22:54:08.458087       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0676b004c2b506017"
I0623 22:54:08.458097       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0ebe27d3835a4d6c8"
I0623 22:54:08.458107       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-032422557e3d95087"
I0623 22:54:08.458142       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0c86283f5df56d229"
I0623 22:54:08.458152       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-002f28b066d26cd33"
I0623 22:54:08.458164       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0d322a09fab677cb8"
I0623 22:54:08.458174       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0e2c04280b7b7d765"
I0623 22:54:08.458183       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-07fd7be12a8ba76cc"
I0623 22:54:08.995991       1 aws.go:3104] Existing security group ingress: sg-09d1b9cc654b01bea []
I0623 22:54:08.996052       1 aws.go:3135] Adding security group ingress: sg-09d1b9cc654b01bea [{
  FromPort: 80,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0"
    }],
  ToPort: 80
} {
  FromPort: 443,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0"
    }],
  ToPort: 443
} {
  FromPort: 3,
  IpProtocol: "icmp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0"
    }],
  ToPort: 4
}]
I0623 22:54:09.253520       1 aws_loadbalancer.go:972] Creating load balancer for ingress-nginx/my-ingress-ingress-nginx-controller with name: ab70434061a0a451699ef7a3970f85a4
I0623 22:54:10.166963       1 aws_loadbalancer.go:1175] Updating load-balancer attributes for "ab70434061a0a451699ef7a3970f85a4"
I0623 22:54:10.467785       1 aws.go:4173] Loadbalancer ab70434061a0a451699ef7a3970f85a4 (ingress-nginx/my-ingress-ingress-nginx-controller) has DNS name ab70434061a0a451699ef7a3970f85a4-1267624674.us-east-1.elb.amazonaws.com
I0623 22:54:10.467883       1 controller.go:894] Patching status for service ingress-nginx/my-ingress-ingress-nginx-controller
I0623 22:54:10.472436       1 event.go:291] "Event occurred" object="ingress-nginx/my-ingress-ingress-nginx-controller" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer"
I0623 22:54:24.026442       1 controller.go:368] Ensuring load balancer for service jenkins/my-jenkins
I0623 22:54:24.026492       1 controller.go:853] Adding finalizer to service jenkins/my-jenkins
I0623 22:54:24.027401       1 event.go:291] "Event occurred" object="jenkins/my-jenkins" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I0623 22:54:24.046870       1 aws.go:3788] EnsureLoadBalancer(kubernetes, jenkins, my-jenkins, us-east-1, , [{http TCP <nil> 8888 {0 8080 } 31065}], map[meta.helm.sh/release-name:my-jenkins meta.helm.sh/release-namespace:jenkins])
W0623 22:54:24.046926       1 instances.go:115] node "ip-10-0-8-166.ec2.internal" did not have ProviderID set
I0623 22:54:24.318939       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0fdb3f166ec64326c"
I0623 22:54:24.318970       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-02c064f4e42731165"
I0623 22:54:24.318980       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0f093fc357613a71a"
I0623 22:54:24.318988       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0b14dc8ed2ff25e73"
I0623 22:54:24.318997       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0676b004c2b506017"
I0623 22:54:24.319006       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0ebe27d3835a4d6c8"
I0623 22:54:24.319016       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-032422557e3d95087"
I0623 22:54:24.319024       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0c86283f5df56d229"
I0623 22:54:24.319032       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-002f28b066d26cd33"
I0623 22:54:24.319041       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0d322a09fab677cb8"
I0623 22:54:24.319050       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0e2c04280b7b7d765"
I0623 22:54:24.319058       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-07fd7be12a8ba76cc"
I0623 22:54:24.870109       1 aws.go:3104] Existing security group ingress: sg-090d9e5648fe6b23e []
I0623 22:54:24.870166       1 aws.go:3135] Adding security group ingress: sg-090d9e5648fe6b23e [{
  FromPort: 8888,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0"
    }],
  ToPort: 8888
} {
  FromPort: 3,
  IpProtocol: "icmp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0"
    }],
  ToPort: 4
}]
I0623 22:54:25.065322       1 aws_loadbalancer.go:972] Creating load balancer for jenkins/my-jenkins with name: accd09597114a4e858db10c0cf749fb8
I0623 22:54:26.047058       1 aws_loadbalancer.go:1175] Updating load-balancer attributes for "accd09597114a4e858db10c0cf749fb8"
I0623 22:54:26.317212       1 aws.go:4173] Loadbalancer accd09597114a4e858db10c0cf749fb8 (jenkins/my-jenkins) has DNS name accd09597114a4e858db10c0cf749fb8-407145243.us-east-1.elb.amazonaws.com
I0623 22:54:26.317279       1 controller.go:894] Patching status for service jenkins/my-jenkins
I0623 22:54:26.318098       1 event.go:291] "Event occurred" object="jenkins/my-jenkins" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer"

Getting below result with no nodes attached.

This page isn’t working ... didn’t send any data.
ERR_EMPTY_RESPONSE

I've noticed below in logs:

W0623 22:54:07.975072 1 instances.go:115] node "ip-10-0-8-166.ec2.internal" did not have ProviderID set
but using kubectl describe node ip-10-0-4-96, it shows

ProviderID: aws:///us-east-1b/i-0ea33d3ca163e576e

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 21, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 22, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@is-it-ayush
Copy link

This occurred to me & I'm not really sure how to resolve it in my fresh k8s cluster. After reading above I can confirm,

  • The clusterID in the log line 'AWS cloud filtering on ClusterID: <cluster_id> is the same as my cluster name and the tag kubernetes.io/cluster/<cluster_id>. Here for me, <cluster_id> = kubernetes.
  • Hostname is also set to ip-<x>-<x>-<x>-<x>.<region>.compute.internal.

I get the following when I run kubectl logs aws-cloud-controller-manager-r4sv2 -n kube-system,

E0315 13:22:27.190762       1 node_controller.go:277] Error getting instance metadata for node addresses: error fetching node by provider ID: Invalid format for AWS instance (), and error by node name: could not look up instance ID for node "ip-<x>-<x>-<x>-<x>.<region>.compute.internal": node has no providerID
I0315 13:22:27.190830       1 node_controller.go:263] Update 1 nodes status took 158.029µs.

I'm not really sure what's causing this.

@tppolkow
Copy link

I encountered this issue and was able to resolve it by adding the taint node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule to the nodes with the issue. This gets the node initialized by the external cloud provider which should set the provider ID and resolves the issue for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests