-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes fail to join cluster and CNI plugin not being added #1972
Comments
We are experiencing this too for the first time today. The
|
I've found a workaround that is a bit tedious and not really ideal. First comment out |
@mghantous we're facing a similar issue, and indeed commenting out the core DNS add-on and managed node group allowed the cluster to finally provision correctly. Adding back the core DNS and node group still failed for us though. Are you also provisioning a private (no NATs) cluster? If so are there any considerations for the private endpoints maybe? |
We are provisioning private subnets, so I am not sure why it failed for you. Maybe check in the aws console to see if you have any new error messages under Node Conditions or any of the Add-ons? I am guessing it is probably not the "cni config uninitialized" error message I was seeing if your Navigate to node conditions: Navigate to add-ons |
We're provisioning to private subnets too and worse of it is even if I manually add a managed node group to the provisioned cluster with manually added vpc-cni, the nodes won't show up at the node group (though they are now healthy). There's something missing I'm not picking up here. |
what do you mean by
By default, all EKS clusters are provisioned with CoreDNS, VPC CNI, and kube-proxy pods in order to bootstrap the cluster properly. Even if you do not enable any addons, these services are scheduled to run once nodes are provisioned Reference: aws/containers-roadmap#923 |
After the cluster is created, nodes are to be added but they don't, they get unhealthy status and won't join the nodegroup. Then, when checking the add-ons, kube-proxy and coredns are there but vpc-cni is not. Manually adding the vpc-cni and re-running terraform fails saying vpc-cni add-ons is already there (Error: error creating EKS Add-On (pixlee-staging-eks:vpc-cni): ResourceInUseException: Addon already exists.). |
have you tried deploying |
Is it possible that this is something that broke with https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html |
I meant to mention, it seems you can repro this outside of terrafrom, just in the aws console by deleting vpc-cni add-on if you have one and then trying to add a nodegroup. |
Just did this now, only used the 'complete' node group (removed the others), a few optional portions commented out but pretty much all else 'as is'. Same issue:
Same thing, cluster is created, node group is created but instances won't join the cluster. Also NO add-ons were added to the cluster. |
Same problem here with a public+private cluster v1.21 without this TF module, the nodes won't join. In fact the docker images are not pulled for unknown reason (DNS resolution check ok, VPC endpoint to ECR ok, cluster endpoint ok, role ok). But with a private only cluster, the nodes have joined. |
Following this (albeit it feels bad doing sequential steps in templates) worked for me, thanks! Steps:
|
This is still an issue, did anyone find a work around besides the one shown above? |
I am not able to reproduce with the examples we have here in this project. The information I am seeing is mixed and sporadic - hard to piece together a reproduction. Also, the screenshot above by @anbotero shows an issue in the region which is not related to the module |
Here, maybe this code can help you reproduce
It creates instance group, and the EKS. But instance group does not join EKS. |
@farrukh90 thats v17 though - we're on v18 |
@bryantbiggs if you delete the |
if you delete the vpc-cni, nodes won't join because the pod networking is gone and pods/nodes won't be able to connect with the control plane. You need a network plugin running for nodes to register with the control plane |
This is the code I used, very little changed from the example/managed_nodegroup (only the complete part, as I don't need/use bottlerocket or containerd or the custom_ami): module "eks" {
create = true
source = "terraform-aws-modules/eks/aws"
cluster_name = local.cluster_name
cluster_version = var.cluster_version
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
vpc_id = local.vpc_id
subnet_ids = local.private_subnets
tags = var.tags
# IPV4
cluster_ip_family = "ipv4"
cluster_addons = {
coredns = {
resolve_conflicts = "OVERWRITE"
}
kube-proxy = {}
vpc-cni = {
resolve_conflicts = "OVERWRITE"
service_account_role_arn = module.vpc_cni_irsa.iam_role_arn
}
}
cluster_encryption_config = [{
provider_key_arn = aws_kms_key.eks.arn
resources = ["secrets"]
}]
# # Extend cluster security group rules
cluster_security_group_additional_rules = {
egress_nodes_ephemeral_ports_tcp = {
description = "To node 1025-65535"
protocol = "tcp"
from_port = 1025
to_port = 65535
type = "egress"
source_node_security_group = true
}
}
# Extend node-to-node security group rules
node_security_group_additional_rules = {
ingress_self_all = {
description = "Node to node all ports/protocols"
protocol = "-1"
from_port = 0
to_port = 0
type = "ingress"
self = true
}
egress_all = {
description = "Node all egress"
protocol = "-1"
from_port = 0
to_port = 0
type = "egress"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
}
eks_managed_node_group_defaults = {
ami_type = "AL2_x86_64"
disk_size = 50
instance_types = var.ondemand_instance_types
# We are using the IRSA created below for permissions
# However, we have to deploy with the policy attached FIRST (when creating a fresh cluster)
# and then turn this off after the cluster/node group is created. Without this initial policy,
# the VPC CNI fails to assign IPs and nodes cannot join the cluster
# See https://github.com/aws/containers-roadmap/issues/1666 for more context
iam_role_attach_cni_policy = true
}
eks_managed_node_groups = {
# Complete
complete = {
name = "complete-eks-mng"
use_name_prefix = true
subnet_ids = local.private_subnets
min_size = var.ondemand_min_instances
max_size = var.ondemand_max_instances
desired_size = var.ondemand_min_instances
ami_id = data.aws_ami.eks_default.image_id
capacity_type = "ON_DEMAND"
force_update_version = true
labels = local.k8s_labels
update_config = {
max_unavailable_percentage = 50 # or set `max_unavailable`
}
description = "EKS managed node group example launch template"
ebs_optimized = true
vpc_security_group_ids = [aws_security_group.additional.id]
disable_api_termination = false
enable_monitoring = true
create_iam_role = true
iam_role_name = "${local.cluster_name}-managed-node-group-complete-example"
iam_role_use_name_prefix = false
iam_role_description = "${local.cluster_name}-EKS managed node group complete example role"
iam_role_additional_policies = [
"arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
]
create_security_group = true
security_group_name = "${local.cluster_name}-eks-managed-node-group-complete-example"
security_group_use_name_prefix = false
security_group_description = "${local.cluster_name}-EKS managed node group complete example security group"
security_group_rules = {
phoneOut = {
description = "Hello CloudFlare"
protocol = "udp"
from_port = 53
to_port = 53
type = "egress"
cidr_blocks = ["1.1.1.1/32"]
}
phoneHome = {
description = "Hello cluster"
protocol = "udp"
from_port = 53
to_port = 53
type = "egress"
source_cluster_security_group = true # bit of reflection lookup
}
}
tags = local.node_tags
}
}
}
resource "aws_iam_role_policy_attachment" "additional" {
for_each = module.eks.eks_managed_node_groups
policy_arn = aws_iam_policy.node_additional.arn
role = each.value.iam_role_name
}
resource "aws_iam_policy" "node_additional" {
name = "${local.cluster_name}-additional"
description = "Example usage of node additional policy"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"ec2:Describe*",
]
Effect = "Allow"
Resource = "*"
},
]
})
tags = var.tags
}
locals {
kubeconfig = yamlencode({
apiVersion = "v1"
kind = "Config"
current-context = "terraform"
clusters = [{
name = module.eks.cluster_id
cluster = {
certificate-authority-data = module.eks.cluster_certificate_authority_data
server = module.eks.cluster_endpoint
}
}]
contexts = [{
name = "terraform"
context = {
cluster = module.eks.cluster_id
user = "terraform"
}
}]
users = [{
name = "terraform"
user = {
token = data.aws_eks_cluster_auth.cluster.token
}
}]
})
}
resource "null_resource" "patch" {
triggers = {
kubeconfig = base64encode(local.kubeconfig)
cmd_patch = "kubectl patch configmap/aws-auth --patch \"${module.eks.aws_auth_configmap_yaml}\" -n kube-system --kubeconfig <(echo $KUBECONFIG | base64 --decode)"
}
provisioner "local-exec" {
interpreter = ["/bin/bash", "-c"]
environment = {
KUBECONFIG = self.triggers.kubeconfig
}
command = self.triggers.cmd_patch
}
}
module "vpc_cni_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "~> 4.12"
role_name_prefix = "VPC-CNI-IRSA"
attach_vpc_cni_policy = true
vpc_cni_enable_ipv6 = true
oidc_providers = {
main = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["kube-system:aws-node"]
}
}
tags = var.tags
}
resource "aws_security_group" "remote_access" {
name_prefix = "${local.cluster_name}-remote-access"
description = "Allow remote SSH access"
vpc_id = local.vpc_id
ingress {
description = "SSH access"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
tags = var.tags
}
resource "aws_kms_key" "eks" {
description = "EKS Secret Encryption Key"
deletion_window_in_days = 7
enable_key_rotation = true
tags = var.tags
}
resource "aws_kms_key" "ebs" {
description = "Customer managed key to encrypt EKS managed node group volumes"
deletion_window_in_days = 7
policy = data.aws_iam_policy_document.ebs.json
}
# This policy is required for the KMS key used for EKS root volumes, so the cluster is allowed to enc/dec/attach encrypted EBS volumes
data "aws_iam_policy_document" "ebs" {
# Copy of default KMS policy that lets you manage it
statement {
sid = "Enable IAM User Permissions"
actions = ["kms:*"]
resources = ["*"]
principals {
type = "AWS"
identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"]
}
}
# Required for EKS
statement {
sid = "Allow service-linked role use of the CMK"
actions = [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
]
resources = ["*"]
principals {
type = "AWS"
identifiers = [
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling", # required for the ASG to manage encrypted volumes for nodes
module.eks.cluster_iam_role_arn, # required for the cluster / persistentvolume-controller to create encrypted PVCs
]
}
}
statement {
sid = "Allow attachment of persistent resources"
actions = ["kms:CreateGrant"]
resources = ["*"]
principals {
type = "AWS"
identifiers = [
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling", # required for the ASG to manage encrypted volumes for nodes
module.eks.cluster_iam_role_arn, # required for the cluster / persistentvolume-controller to create encrypted PVCs
]
}
condition {
test = "Bool"
variable = "kms:GrantIsForAWSResource"
values = ["true"]
}
}
}
resource "aws_security_group" "additional" {
name_prefix = "${local.cluster_name}-additional"
vpc_id = local.vpc_id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [
"10.0.0.0/8",
"172.16.0.0/12",
"192.168.0.0/16",
]
}
tags = var.tags
} This created the cluster but failed to have any add-ons. Commenting the eks_managed_node_groups portion plus coredns, made vpc_cni and kube-proxy get deployed but running again with eks_managed_node_groups and coredns created the nodegroup but instances failed to join the cluster. I was using 1.18 version (deployed this yesterday). |
So how do I ensure terraform will apply the |
I don't follow. By default on every EKS cluster, the |
So I think that is my problem. It for some reason is not running without the addon. |
While trying this for over 2 weeks, I can say that for this issue, the vpc-cni is NOT being created before creating the node_groups. But even after the vpc-cni gets created somehow (either manually or running terraform without the nodegroup, with just vpc-cni and kube-proxy), the nodes are not joining the cluster. |
I can't repro because I don't know what the variables are |
I'm sorry - I really don't follow what you are saying here. It would be REALLY helpful to have
|
I'm using an existing vpc (basically getting from data "terraform_remote_state"), one that already has a cluster working. Can I give you the code without variables, except for the vpc/subnets portion? |
Not really because VPC networking can have a big effect on this and not knowing how your VPC is setup won't help much. You can take one of the examples - copy+paste it somewhere, modify it to match your setup, deploy it and ensure you are seeing the same issue, then paste it here. |
Sorry @bryantbiggs I am trying to understand these two statements
Ok. It is not for me for some reason, but it makes sense that it is supposed to work like that.
That sounds contradictory because you are saying the addon is only for "taking control over its configuration/management". So if I delete it, shouldn't nodes still work? Or once it's added I can no longer go back to deleting it? Maybe it is because I do not have a CNI policy attached to the nodegroup role? Only the vpc-cni addon service account. |
apologies but I am going to have to refer back to here #1972 (comment) |
@evenme did you deploy that config and verify it reproduces the issue? |
You are specifying an AMI which means you now need to provide the bootstrap user data. You can:
|
Doesn't the data filter gets the right ami, like in the example? At least the deployed code does create the instances with the right ami, they just hang there alone without joining the cluster. |
The examples are for demonstrating all the different ways you can use the module as well as for testing changes. Unless you need to use a specific AMI, you don't need to tell EKS managed node groups which specific AMI to use. Instead, you can specify the AMI type which will pull the proper AMI based on the type selected https://docs.aws.amazon.com/eks/latest/APIReference/API_Nodegroup.html#AmazonEKS-Type-Nodegroup-amiType https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html#launch-template-custom-ami |
As @bryantbiggs asked, I just deployed this code: locals {
cluster_name = "staging2-eks"
tags = {
terraform = "true"
environment = "stag2"
usage = "eks"
}
k8s_labels = {
environment = "stag2"
region = "us-east-1"
}
node_tags = {
"k8s.io/cluster-autoscaler/enabled" = "true"
"k8s.io/cluster-autoscaler/staging2-eks" = "owned"
}
}
data "aws_availability_zones" "available" {}
data "aws_caller_identity" "current" {}
data "aws_eks_cluster" "cluster" {
name = module.eks.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks.cluster_id
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 3.1"
name = "${local.cluster_name}-vpc"
cidr = "10.124.0.0/16"
private_subnets = [
"10.124.0.0/19",
"10.124.32.0/19",
"10.124.64.0/19"
## 10.124.96.0/19 as spare
]
public_subnets = [
"10.124.128.0/19",
"10.124.160.0/19",
"10.124.192.0/19"
## 10.124.224.0/19 as spare
]
azs = data.aws_availability_zones.available.names
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true
public_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}
}
module "eks" {
create = true
source = "terraform-aws-modules/eks/aws"
cluster_name = local.cluster_name
cluster_version = "1.18"
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
tags = local.tags
# IPV4
cluster_ip_family = "ipv4"
cluster_addons = {
coredns = {
resolve_conflicts = "OVERWRITE"
}
kube-proxy = {}
vpc-cni = {
resolve_conflicts = "OVERWRITE"
service_account_role_arn = module.vpc_cni_irsa.iam_role_arn
}
}
cluster_encryption_config = [{
provider_key_arn = aws_kms_key.eks.arn
resources = ["secrets"]
}]
# # Extend cluster security group rules
cluster_security_group_additional_rules = {
egress_nodes_ephemeral_ports_tcp = {
description = "To node 1025-65535"
protocol = "tcp"
from_port = 1025
to_port = 65535
type = "egress"
source_node_security_group = true
}
}
# Extend node-to-node security group rules
node_security_group_additional_rules = {
ingress_self_all = {
description = "Node to node all ports/protocols"
protocol = "-1"
from_port = 0
to_port = 0
type = "ingress"
self = true
}
egress_all = {
description = "Node all egress"
protocol = "-1"
from_port = 0
to_port = 0
type = "egress"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
}
eks_managed_node_group_defaults = {
ami_type = "AL2_x86_64"
disk_size = 50
instance_types = ["t3.2xlarge", "t3.large", "t3.medium"]
# We are using the IRSA created below for permissions
# However, we have to deploy with the policy attached FIRST (when creating a fresh cluster)
# and then turn this off after the cluster/node group is created. Without this initial policy,
# the VPC CNI fails to assign IPs and nodes cannot join the cluster
# See https://github.com/aws/containers-roadmap/issues/1666 for more context
iam_role_attach_cni_policy = true
}
eks_managed_node_groups = {
# Complete
complete = {
name = "complete-eks-mng"
use_name_prefix = true
subnet_ids = module.vpc.private_subnets
min_size = 2
max_size = 5
desired_size = 2
capacity_type = "ON_DEMAND"
force_update_version = true
labels = local.k8s_labels
update_config = {
max_unavailable_percentage = 50 # or set `max_unavailable`
}
description = "EKS managed node group example launch template"
ebs_optimized = true
vpc_security_group_ids = [aws_security_group.additional.id]
disable_api_termination = false
enable_monitoring = true
create_iam_role = true
iam_role_name = "${local.cluster_name}-managed-node-group-complete-example"
iam_role_use_name_prefix = false
iam_role_description = "${local.cluster_name} EKS managed node group complete example role"
# iam_role_tags = {
# Purpose = "Protector of the kubelet"
# }
iam_role_additional_policies = [
"arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
]
create_security_group = true
security_group_name = "${local.cluster_name}-managed-node-group-complete-example"
security_group_use_name_prefix = false
security_group_description = "${local.cluster_name} EKS managed node group complete example security group"
security_group_rules = {
phoneOut = {
description = "Hello CloudFlare"
protocol = "udp"
from_port = 53
to_port = 53
type = "egress"
cidr_blocks = ["1.1.1.1/32"]
}
phoneHome = {
description = "Hello cluster"
protocol = "udp"
from_port = 53
to_port = 53
type = "egress"
source_cluster_security_group = true # bit of reflection lookup
}
}
# security_group_tags = {
# Purpose = "Protector of the kubelet"
# }
# remote_access = {
# ec2_ssh_key = local.key_name
# source_security_group_ids = [aws_security_group.remote_access.id]
# }
tags = local.node_tags
}
}
}
resource "aws_iam_role_policy_attachment" "additional" {
for_each = module.eks.eks_managed_node_groups
policy_arn = aws_iam_policy.node_additional.arn
role = each.value.iam_role_name
}
resource "aws_iam_policy" "node_additional" {
name = "${local.cluster_name}-additional"
description = "Example usage of node additional policy"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"ec2:Describe*",
]
Effect = "Allow"
Resource = "*"
},
]
})
tags = local.tags
}
locals {
kubeconfig = yamlencode({
apiVersion = "v1"
kind = "Config"
current-context = "terraform"
clusters = [{
name = module.eks.cluster_id
cluster = {
certificate-authority-data = module.eks.cluster_certificate_authority_data
server = module.eks.cluster_endpoint
}
}]
contexts = [{
name = "terraform"
context = {
cluster = module.eks.cluster_id
user = "terraform"
}
}]
users = [{
name = "terraform"
user = {
token = data.aws_eks_cluster_auth.cluster.token
}
}]
})
}
resource "null_resource" "patch" {
triggers = {
kubeconfig = base64encode(local.kubeconfig)
cmd_patch = "kubectl patch configmap/aws-auth --patch \"${module.eks.aws_auth_configmap_yaml}\" -n kube-system --kubeconfig <(echo $KUBECONFIG | base64 --decode)"
}
provisioner "local-exec" {
interpreter = ["/bin/bash", "-c"]
environment = {
KUBECONFIG = self.triggers.kubeconfig
}
command = self.triggers.cmd_patch
}
}
module "vpc_cni_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "~> 4.12"
role_name_prefix = "VPC-CNI-IRSA"
attach_vpc_cni_policy = true
vpc_cni_enable_ipv6 = true
oidc_providers = {
main = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["kube-system:aws-node"]
}
}
tags = local.tags
}
resource "aws_security_group" "remote_access" {
name_prefix = "${local.cluster_name}-remote-access"
description = "Allow remote SSH access"
vpc_id = module.vpc.vpc_id
ingress {
description = "SSH access"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
tags = local.tags
}
resource "aws_kms_key" "eks" {
description = "EKS Secret Encryption Key"
deletion_window_in_days = 7
enable_key_rotation = true
tags = local.tags
}
resource "aws_kms_key" "ebs" {
description = "Customer managed key to encrypt EKS managed node group volumes"
deletion_window_in_days = 7
policy = data.aws_iam_policy_document.ebs.json
}
# This policy is required for the KMS key used for EKS root volumes, so the cluster is allowed to enc/dec/attach encrypted EBS volumes
data "aws_iam_policy_document" "ebs" {
# Copy of default KMS policy that lets you manage it
statement {
sid = "Enable IAM User Permissions"
actions = ["kms:*"]
resources = ["*"]
principals {
type = "AWS"
identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"]
}
}
# Required for EKS
statement {
sid = "Allow service-linked role use of the CMK"
actions = [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
]
resources = ["*"]
principals {
type = "AWS"
identifiers = [
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling", # required for the ASG to manage encrypted volumes for nodes
module.eks.cluster_iam_role_arn, # required for the cluster / persistentvolume-controller to create encrypted PVCs
]
}
}
statement {
sid = "Allow attachment of persistent resources"
actions = ["kms:CreateGrant"]
resources = ["*"]
principals {
type = "AWS"
identifiers = [
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling", # required for the ASG to manage encrypted volumes for nodes
module.eks.cluster_iam_role_arn, # required for the cluster / persistentvolume-controller to create encrypted PVCs
]
}
condition {
test = "Bool"
variable = "kms:GrantIsForAWSResource"
values = ["true"]
}
}
}
resource "aws_security_group" "additional" {
name_prefix = "${local.cluster_name}-additional"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [
"10.0.0.0/8",
"172.16.0.0/12",
"192.168.0.0/16",
]
}
tags = local.tags
}
After 24 min, we got the failure:
The cluster got created, node group as well as the node instances got created, however, the node instances did not join the cluster (but they're listed on the ec2 instances when filtering for cluster-name=staging2-eks"). The add-ons were not deployed (none of them). |
closing out for now - please see above and #1910 (comment) |
Sorry to resurrect a dead issue, but shouldn't it be possible to create the cluster without setting It looks like the VPC CNI addon installation depends on the nodegroups to be created, is this required? I'm not quite sure how the nodes can join the cluster in the first place if the VPC CNI addon isn't installed. Is it a case that for initial provisioning the node has to have the VPC CNI policies attached and then you can migrate to IRSA for the addon? |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Description
Using the example/managed_node_group as the base, I'm creating a 2 nodes 1.18 cluster with 1 managed node group only, however the node group is created without the CNI plugin and the nodes are created but won't join the cluster due to:
The nodes are left as NotReady and terraform error is this:
Versions
Module version [Required]: v18
Terraform version:
❯ terraform providers -version
Terraform v1.1.5
on darwin_amd64
Reproduction Code [Required]
Steps to reproduce the behavior:
Using terraform cloud to plan and run it.
Expected behavior
Nodegroup created with coredns, kube-proxy and vpc-cni add-ons active + nodes joining the cluster successfully.
Actual behavior
No vpc-cni plugin added, nodes with NodeCreationFailure | Unhealthy nodes in the kubernetes cluster.
The text was updated successfully, but these errors were encountered: