Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v17 completely broken for managed node groups updates #1495

Closed
fitchtech opened this issue Jul 21, 2021 · 6 comments
Closed

v17 completely broken for managed node groups updates #1495

fitchtech opened this issue Jul 21, 2021 · 6 comments

Comments

@fitchtech
Copy link

fitchtech commented Jul 21, 2021

This behavior started in v17, I believe removing random pet names had unintended consequences. With a managed node group using a launch template, you cannot update the launch template version or user data without causing this error. What it should do is update the existing node group in place when updating the launch template version. This works correctly in v16

Plan:

  # aws_launch_template.eks will be updated in-place
  ~ resource "aws_launch_template" "eks" {
        arn                     = Ommitted 
      ~ default_version         = 4 -> (known after apply)
...

  # module.eks.module.node_groups.aws_eks_node_group.workers["nodes"] must be replaced
+/- resource "aws_eks_node_group" "workers" {
      ~ ami_type               = "CUSTOM" -> (known after apply) # forces replacement
      ~ arn                    = "Ommitted" -> (known after apply)
      ~ capacity_type          = "ON_DEMAND" -> (known after apply) # forces replacement
        cluster_name           = "staging-eks"
      ~ disk_size              = 0 -> (known after apply)
      + force_update_version   = (known after apply)
 
Ommitted 

      + version                = (known after apply)
      ~ launch_template {
            id      = "lt-0d2bdc5a516a0fbc4"
          ~ name    = "staging-eks-2021072019573211870000000f" -> (known after apply)
          ~ version = "4" -> (known after apply)
        }
      ~ scaling_config {
          ~ desired_size = 5 -> 3
            max_size     = 11
            min_size     = 3
        }
    }
Plan: 1 to add, 2 to change, 1 to destroy.

Error:

aws_launch_template.eks: Modifying... [id=lt-0d2bdc5a516a0fbc4]
aws_launch_template.eks: Modifications complete after 0s [id=lt-0d2bdc5a516a0fbc4]
module.eks.module.node_groups.aws_eks_node_group.workers["nodes"]: Creating...
Error: error creating EKS Node Group (spireon-staging-eks:spireon-staging-eks-nodeGroup1): ResourceInUseException: NodeGroup already exists with name spireon-staging-eks-nodeGroup1 and cluster name spireon-staging-eks
{
  RespMetadata: {
    StatusCode: 409,
    RequestID: "a439f607-a3b4-4146-a7b9-832b74953da6"
  },
  ClusterName: "spireon-staging-eks",
  Message_: "NodeGroup already exists with name nodeGroup1 and cluster name staging-eks",
  NodegroupName: "spireon-staging-eks-nodeGroup1"
}
@fitchtech
Copy link
Author

fitchtech commented Jul 22, 2021

This version is completely broken, cannot even change the instance type of a node group without failing. This node group was created with 17.1 then all I did was try to change the instance type of that group. Any subsequent updates to a node group provisioned with this module causes a failure.

Plan:

module.eks.module.node_groups.aws_eks_node_group.workers["nodes"]: Refreshing state... [id=spireon-staging-eks:spireon-staging-eks-nodeGroup1]
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+/- create replacement and then destroy
 <= read (data resources)
Terraform will perform the following actions:
  # module.eks.data.http.wait_for_cluster[0] will be read during apply
  # (config refers to values not yet known)
 <= data "http" "wait_for_cluster"  {
        body             = "ok"
        ca_certificate   = <<~EOT
            -----BEGIN CERTIFICATE-----
            -----END CERTIFICATE-----
        EOT
...
    }
  # module.eks.module.node_groups.aws_eks_node_group.workers["nodes"] must be replaced
+/- resource "aws_eks_node_group" "workers" {
      ~ ami_type               = "CUSTOM" -> (known after apply)
      ~ arn                    = "...3" -> (known after apply)
      ~ capacity_type          = "ON_DEMAND" -> (known after apply)
        cluster_name           = "spireon-staging-eks"
      ~ disk_size              = 0 -> (known after apply)
      ~ id                     = "spireon-staging-eks:spireon-staging-eks-nodeGroup1" -> (known after apply)
      ~ instance_types         = [ # forces replacement
          - "t3.medium",
          + "m5n.xlarge",
        ]
        labels                 = {
            "environment" = "staging"
        }
        node_group_name        = "spireon-staging-eks-nodeGroup1"
      + node_group_name_prefix = (known after apply)
        node_role_arn          = "..."
      ~ release_version        = "ami-0fa4f049547ac6cb6" -> (known after apply)
      ~ resources              = [
          - {
              - autoscaling_groups              = [
                  - {
                      - name = "..."
                    },
                ]
              - remote_access_security_group_id = ""
            },
        ] -> (known after apply)
      ~ status                 = "ACTIVE" -> (known after apply)
        subnet_ids             = [
            "subnet-0adca40b7abad6e73",
            "subnet-0f5087e9a5f9f4192",
            "subnet-0fa34e75934dded7e",
        ]
        tags                   = {
            "Environment" = "staging"
            "Name"        = "spireon-staging-eks"
            "Namespace"   = "spireon"
        }
        tags_all               = {
            "Environment" = "staging"
            "Name"        = "spireon-staging-eks"
            "Namespace"   = "spireon"
        }
      + version                = (known after apply)
      ~ launch_template {
            id      = "lt-0d2bdc5a516a0fbc4"
          ~ name    = "spireon-staging-eks-2021072019573211870000000f" -> (known after apply)
            version = "5"
        }
      ~ scaling_config {
          ~ desired_size = 4 -> 3
            max_size     = 11
            min_size     = 3
        }
    }

Apply:

module.eks.module.node_groups.aws_eks_node_group.workers["nodes"]: Creating...
Error: error creating EKS Node Group (spireon-staging-eks:spireon-staging-eks-nodeGroup1): ResourceInUseException: NodeGroup already exists with name spireon-staging-eks-nodeGroup1 and cluster name spireon-staging-eks
{
  RespMetadata: {
    StatusCode: 409,
    RequestID: "99e4dda9-8539-4d9b-a722-a04bdc96be56"
  },
  ClusterName: "spireon-staging-eks",
  Message_: "NodeGroup already exists with name spireon-staging-eks-nodeGroup1 and cluster name spireon-staging-eks",
  NodegroupName: "spireon-staging-eks-nodeGroup1"
}
  on .terraform/modules/eks/modules/node_groups/node_groups.tf line 1, in resource "aws_eks_node_group" "workers":

@jjhidalgar
Copy link
Contributor

For some reason I don't experience these issues. Not sure what you are doing differently.

Also, it could be worth trying setting this: set_instance_types_on_lt = true

Because else, changing instance_type will trigger recreate (as you may see in the AWS console, you cannot change the instance type if it is set on the node group instead of in the launch template)

@daroga0002
Copy link
Contributor

first of all can you edit your comments and make proper formatting as this is not possible to read your outputs (read here how to make it)

@antonbabenko
Copy link
Member

@daroga0002 I have updated the formatting.

I think this is a duplicate of another issue you've just replied to. Could you take a look again?

@daroga0002
Copy link
Contributor

Yup, this is exactly same as in #1525

@fitchtech node groups cannot change instance type dynamically (it is AWS node groups limitation where you cannot edit it). So you must add new, relocate workload and then remove legacy one to change instance types.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants