v17 completely broken for managed node groups updates #1495

fitchtech · 2021-07-21T23:01:05Z

This behavior started in v17, I believe removing random pet names had unintended consequences. With a managed node group using a launch template, you cannot update the launch template version or user data without causing this error. What it should do is update the existing node group in place when updating the launch template version. This works correctly in v16

Plan:

  # aws_launch_template.eks will be updated in-place
  ~ resource "aws_launch_template" "eks" {
        arn                     = Ommitted 
      ~ default_version         = 4 -> (known after apply)
...

  # module.eks.module.node_groups.aws_eks_node_group.workers["nodes"] must be replaced
+/- resource "aws_eks_node_group" "workers" {
      ~ ami_type               = "CUSTOM" -> (known after apply) # forces replacement
      ~ arn                    = "Ommitted" -> (known after apply)
      ~ capacity_type          = "ON_DEMAND" -> (known after apply) # forces replacement
        cluster_name           = "staging-eks"
      ~ disk_size              = 0 -> (known after apply)
      + force_update_version   = (known after apply)
 
Ommitted 

      + version                = (known after apply)
      ~ launch_template {
            id      = "lt-0d2bdc5a516a0fbc4"
          ~ name    = "staging-eks-2021072019573211870000000f" -> (known after apply)
          ~ version = "4" -> (known after apply)
        }
      ~ scaling_config {
          ~ desired_size = 5 -> 3
            max_size     = 11
            min_size     = 3
        }
    }
Plan: 1 to add, 2 to change, 1 to destroy.

Error:

aws_launch_template.eks: Modifying... [id=lt-0d2bdc5a516a0fbc4]
aws_launch_template.eks: Modifications complete after 0s [id=lt-0d2bdc5a516a0fbc4]
module.eks.module.node_groups.aws_eks_node_group.workers["nodes"]: Creating...
Error: error creating EKS Node Group (spireon-staging-eks:spireon-staging-eks-nodeGroup1): ResourceInUseException: NodeGroup already exists with name spireon-staging-eks-nodeGroup1 and cluster name spireon-staging-eks
{
  RespMetadata: {
    StatusCode: 409,
    RequestID: "a439f607-a3b4-4146-a7b9-832b74953da6"
  },
  ClusterName: "spireon-staging-eks",
  Message_: "NodeGroup already exists with name nodeGroup1 and cluster name staging-eks",
  NodegroupName: "spireon-staging-eks-nodeGroup1"
}

The text was updated successfully, but these errors were encountered:

fitchtech · 2021-07-22T18:06:01Z

This version is completely broken, cannot even change the instance type of a node group without failing. This node group was created with 17.1 then all I did was try to change the instance type of that group. Any subsequent updates to a node group provisioned with this module causes a failure.

Plan:

module.eks.module.node_groups.aws_eks_node_group.workers["nodes"]: Refreshing state... [id=spireon-staging-eks:spireon-staging-eks-nodeGroup1]
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+/- create replacement and then destroy
 <= read (data resources)
Terraform will perform the following actions:
  # module.eks.data.http.wait_for_cluster[0] will be read during apply
  # (config refers to values not yet known)
 <= data "http" "wait_for_cluster"  {
        body             = "ok"
        ca_certificate   = <<~EOT
            -----BEGIN CERTIFICATE-----
            -----END CERTIFICATE-----
        EOT
...
    }
  # module.eks.module.node_groups.aws_eks_node_group.workers["nodes"] must be replaced
+/- resource "aws_eks_node_group" "workers" {
      ~ ami_type               = "CUSTOM" -> (known after apply)
      ~ arn                    = "...3" -> (known after apply)
      ~ capacity_type          = "ON_DEMAND" -> (known after apply)
        cluster_name           = "spireon-staging-eks"
      ~ disk_size              = 0 -> (known after apply)
      ~ id                     = "spireon-staging-eks:spireon-staging-eks-nodeGroup1" -> (known after apply)
      ~ instance_types         = [ # forces replacement
          - "t3.medium",
          + "m5n.xlarge",
        ]
        labels                 = {
            "environment" = "staging"
        }
        node_group_name        = "spireon-staging-eks-nodeGroup1"
      + node_group_name_prefix = (known after apply)
        node_role_arn          = "..."
      ~ release_version        = "ami-0fa4f049547ac6cb6" -> (known after apply)
      ~ resources              = [
          - {
              - autoscaling_groups              = [
                  - {
                      - name = "..."
                    },
                ]
              - remote_access_security_group_id = ""
            },
        ] -> (known after apply)
      ~ status                 = "ACTIVE" -> (known after apply)
        subnet_ids             = [
            "subnet-0adca40b7abad6e73",
            "subnet-0f5087e9a5f9f4192",
            "subnet-0fa34e75934dded7e",
        ]
        tags                   = {
            "Environment" = "staging"
            "Name"        = "spireon-staging-eks"
            "Namespace"   = "spireon"
        }
        tags_all               = {
            "Environment" = "staging"
            "Name"        = "spireon-staging-eks"
            "Namespace"   = "spireon"
        }
      + version                = (known after apply)
      ~ launch_template {
            id      = "lt-0d2bdc5a516a0fbc4"
          ~ name    = "spireon-staging-eks-2021072019573211870000000f" -> (known after apply)
            version = "5"
        }
      ~ scaling_config {
          ~ desired_size = 4 -> 3
            max_size     = 11
            min_size     = 3
        }
    }

Apply:

module.eks.module.node_groups.aws_eks_node_group.workers["nodes"]: Creating...
Error: error creating EKS Node Group (spireon-staging-eks:spireon-staging-eks-nodeGroup1): ResourceInUseException: NodeGroup already exists with name spireon-staging-eks-nodeGroup1 and cluster name spireon-staging-eks
{
  RespMetadata: {
    StatusCode: 409,
    RequestID: "99e4dda9-8539-4d9b-a722-a04bdc96be56"
  },
  ClusterName: "spireon-staging-eks",
  Message_: "NodeGroup already exists with name spireon-staging-eks-nodeGroup1 and cluster name spireon-staging-eks",
  NodegroupName: "spireon-staging-eks-nodeGroup1"
}
  on .terraform/modules/eks/modules/node_groups/node_groups.tf line 1, in resource "aws_eks_node_group" "workers":

jjhidalgar · 2021-07-22T18:24:20Z

For some reason I don't experience these issues. Not sure what you are doing differently.

Also, it could be worth trying setting this: set_instance_types_on_lt = true

Because else, changing instance_type will trigger recreate (as you may see in the AWS console, you cannot change the instance type if it is set on the node group instead of in the launch template)

daroga0002 · 2021-08-26T08:33:41Z

first of all can you edit your comments and make proper formatting as this is not possible to read your outputs (read here how to make it)

antonbabenko · 2021-08-26T09:04:16Z

@daroga0002 I have updated the formatting.

I think this is a duplicate of another issue you've just replied to. Could you take a look again?

daroga0002 · 2021-08-26T09:17:27Z

Yup, this is exactly same as in #1525

@fitchtech node groups cannot change instance type dynamically (it is AWS node groups limitation where you cannot edit it). So you must add new, relocate workload and then remove legacy one to change instance types.

github-actions · 2022-11-19T02:25:44Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

antonbabenko closed this as completed Aug 26, 2021

github-actions bot locked as resolved and limited conversation to collaborators Nov 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v17 completely broken for managed node groups updates #1495

v17 completely broken for managed node groups updates #1495

fitchtech commented Jul 21, 2021 •

edited by antonbabenko

Loading

fitchtech commented Jul 22, 2021 •

edited by antonbabenko

Loading

jjhidalgar commented Jul 22, 2021

daroga0002 commented Aug 26, 2021

antonbabenko commented Aug 26, 2021

daroga0002 commented Aug 26, 2021

github-actions bot commented Nov 19, 2022

v17 completely broken for managed node groups updates #1495

v17 completely broken for managed node groups updates #1495

Comments

fitchtech commented Jul 21, 2021 • edited by antonbabenko Loading

fitchtech commented Jul 22, 2021 • edited by antonbabenko Loading

jjhidalgar commented Jul 22, 2021

daroga0002 commented Aug 26, 2021

antonbabenko commented Aug 26, 2021

daroga0002 commented Aug 26, 2021

github-actions bot commented Nov 19, 2022

fitchtech commented Jul 21, 2021 •

edited by antonbabenko

Loading

fitchtech commented Jul 22, 2021 •

edited by antonbabenko

Loading