eks_managed_node_groups with irsa enabled fails #1894

OHaimanov · 2022-02-22T15:12:57Z

Description

Hi Team, try to create eks cluster based on example with irsa enabling.
But faced with issue that aws-node didn't starts.

Maybe it is bug, or maybe i did something wrong or missed, could you please look on this?

Versions

Terraform:
1.15
Provider(s):
"hashicorp/aws" >= 4.1.0
"gavinbunney/kubectl" >= 1.13.1
Module:
terraform-aws-modules/eks/aws
version = "18.7.2"

Reproduction

Steps to reproduce the behavior:

Code Snippet to Reproduce

locals {
  oidc_url = replace(module.eks_cluster.cluster_oidc_issuer_url, "https://", "")
}

module "eks_cluster" {
  source  = "terraform-aws-modules/eks/aws"
  version = "18.7.2"

  cluster_name    = var.rancher_cluster_name
  cluster_version = "1.21"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  cluster_addons = {
    coredns = {
      resolve_conflicts = "OVERWRITE"
    }
    kube-proxy = {}
    vpc-cni = {
      resolve_conflicts        = "OVERWRITE"
      service_account_role_arn = module.vpc_cni_irsa.iam_role_arn
    }
  }

  # Extend cluster security group rules
  cluster_security_group_additional_rules = {
    egress_nodes_ephemeral_ports_tcp = {
      description                = "To node 1025-65535"
      protocol                   = "tcp"
      from_port                  = 1025
      to_port                    = 65535
      type                       = "egress"
      source_node_security_group = true
    }
  }

  # Extend node-to-node security group rules
  node_security_group_additional_rules = {
    ingress_self_all = {
      description = "Node to node all ports/protocols"
      protocol    = "-1"
      from_port   = 0
      to_port     = 0
      type        = "ingress"
      self        = true
    }
    egress_all = {
      description      = "Node all egress"
      protocol         = "-1"
      from_port        = 0
      to_port          = 0
      type             = "egress"
      cidr_blocks      = ["0.0.0.0/0"]
    }
  }

  eks_managed_node_group_defaults = {
    ami_type       = "AL2_x86_64"
    disk_size      = 50
    instance_types = var.asg_instance_types

    # We are using the IRSA created below for permissions
    iam_role_attach_cni_policy = false
  }
  eks_managed_node_groups = {
    default_node_group = {
      create_launch_template = false
      launch_template_name   = ""
    }
  }
}
module "vpc_cni_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "4.13.0"

  role_name             = join("-", [var.iac_environment_tag, var.rancher_cluster_name, "vpc-cni"])
  attach_vpc_cni_policy = true
  vpc_cni_enable_ipv4   = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks_cluster.oidc_provider_arn
      namespace_service_accounts = ["kube-system:aws-node"]
    }
  }

  tags = {
    Name = "vpc-cni"
  }
}

Expected behavior

All cluster nodes Up and running

Actual behavior

Cluster nodes in Not ready state

Warning | Unhealthy | 24 minutes | kubelet | Readiness probe failed: {"level":"info","ts":"2022-02-22T14:43:46.341Z","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}
-- | -- | -- | -- | --
Warning | Unhealthy | 23 minutes | kubelet | Readiness probe failed: {"level":"info","ts":"2022-02-22T14:43:56.337Z","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning | Unhealthy | 23 minutes | kubelet | Readiness probe failed: {"level":"info","ts":"2022-02-22T14:44:06.341Z","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning | Unhealthy | 23 minutes | kubelet | Liveness probe failed: {"level":"info","ts":"2022-02-22T14:44:14.728Z","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning | Unhealthy | 23 minutes | kubelet | Readiness probe failed: {"level":"info","ts":"2022-02-22T14:44:16.336Z","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning | Unhealthy | 23 minutes | kubelet | (combined from similar events): Readiness probe failed: {"level":"info","ts":"2022-02-22T14:58:06.333Z","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}
Normal | Killing | 23 minutes | kubelet | Container aws-node failed liveness probe, will be restarted
Normal | Pulled | 23 minutes | kubelet | Container image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.10.1-eksbuild.1" already present on machine
Warning | BackOff | 13 minutes | kubelet | Back-off restarting failed container

The text was updated successfully, but these errors were encountered:

axkng · 2022-02-23T08:21:56Z

I can confirm this.
Having the same problem at the moment.
If you set iam_role_attach_cni_policy = true for the managed nodes it works.
Only did that for testing, as I want to stick to the better practices.

OHaimanov · 2022-02-23T10:01:27Z

I can confirm this. Having the same problem at the moment. If you set iam_role_attach_cni_policy = true for the managed nodes it works. Only did that for testing, as I want to stick to the better practices.

Yes looks like addon arn attachment didn't applies during module run. If initially create cluster with iam_role_attach_cni_policy = true and then update addon to use separate iam role and remove policy from cluster all works fine, but not as in example

MadsRC · 2022-03-02T17:07:50Z

I just spend a day debugging why my nodes wouldn't attach to new clusters. Turns out I was running into this exact issue.

Setting iam_role_attach_cni_policy = true for the initial creation did the trick, and iam_role_attach_cni_policy = false was then applied afterwards...

Not pretty, but it works

bryantbiggs · 2022-03-02T17:13:09Z

yes, this was unfortunate to discover as well - I have updated the eks-managed-node-group example and added in some notes as well in another PR #1915

I've also added this scenario into the container roadmap proposal I submitted as well aws/containers-roadmap#1666

antonbabenko · 2022-03-02T17:30:25Z

This issue has been resolved in version 18.8.0 🎉

github-actions · 2022-11-13T02:30:50Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

bryantbiggs added the question label Feb 22, 2022

bryantbiggs added the documentation label Mar 2, 2022

bryantbiggs mentioned this issue Mar 2, 2022

feat: Add additional IAM policy to allow cluster role to use KMS key provided for cluster encryption #1915

Merged

1 task

antonbabenko closed this as completed in #1915 Mar 2, 2022

dewjam mentioned this issue Mar 9, 2022

Could not launch node, launching instances, with fleet error(s), UnauthorizedOperation: You are not authorized to perform this operation. aws/karpenter-provider-aws#1488

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eks_managed_node_groups with irsa enabled fails #1894

eks_managed_node_groups with irsa enabled fails #1894

OHaimanov commented Feb 22, 2022

axkng commented Feb 23, 2022 •

edited

Loading

OHaimanov commented Feb 23, 2022 •

edited

Loading

MadsRC commented Mar 2, 2022

bryantbiggs commented Mar 2, 2022

antonbabenko commented Mar 2, 2022

github-actions bot commented Nov 13, 2022

eks_managed_node_groups with irsa enabled fails #1894

eks_managed_node_groups with irsa enabled fails #1894

Comments

OHaimanov commented Feb 22, 2022

Description

Versions

Reproduction

Code Snippet to Reproduce

Expected behavior

Actual behavior

axkng commented Feb 23, 2022 • edited Loading

OHaimanov commented Feb 23, 2022 • edited Loading

MadsRC commented Mar 2, 2022

bryantbiggs commented Mar 2, 2022

antonbabenko commented Mar 2, 2022

github-actions bot commented Nov 13, 2022

axkng commented Feb 23, 2022 •

edited

Loading

OHaimanov commented Feb 23, 2022 •

edited

Loading