[BUG] Lifecycle tags in node_group prevents nodegroup from being changed #1525

timothyclarke · 2021-08-11T13:07:06Z

Description

Changing instance types in node groups requires a manual workaround because lifecycle tags are incorrect

If I try to switch an instance type eg t3.small -> t3.medium or even just adding more types I cannot apply without performing a targeted delete or simply removing the node group.
The issue is https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/modules/node_groups/node_groups.tf#L84 attempts to create a replacement identically named node group before removing the old one.

⚠️ Note

Before you submit an issue, please perform the following first:

Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
Re-initialize the project root to pull down modules: terraform init
Re-attempt your terraform plan or apply and check if the issue still persists

All of these steps are performed by our CI server

Versions

Terraform: 15.4
Provider(s):
Module:

Reproduction

Create cluster with node group
Change the node group definition
Attempt to apply the changes

Code Snippet to Reproduce

node_groups = [
    {
        name                    = "change-me"
        capacity_type           = "SPOT"
        instance_types          = ["t3.medium"]
        max_capacity            = "2"
        min_capacity            = "1"
        desired_capacity        = "1"
        disk_size               = 20
        additional_tags = {
            project_cost_type   =   "Shared"
        }
        k8s_labels = {
            "example.com/nodetype"   = "changeme"
        }
    }
]

Replace instance_types = ["t3.medium"] with instance_types = ["t3a.medium"] after the eks cluster has been created and attempt to re-apply (note it plans fine)

Expected behavior

Terraform should apply the change without requiring terraform destroy -target .... or similar

Actual behavior

error creating EKS Node Group (example:change-me): ResourceInUseException: NodeGroup already exists with name change-me and cluster name example
RespMetadata: {
    StatusCode: 409,
    RequestID: "< removed uuid>"
,
  ClusterName: "example",
  Message_: "NodeGroup already exists with name change-me and cluster name example",
  NodegroupName: "change-me"
```

The text was updated successfully, but these errors were encountered:

selfieblue · 2021-08-15T04:51:59Z

+1 we are ranning to the same issue.

daroga0002 · 2021-08-26T08:23:34Z

Nope this is not a bug in module. This is how managed Node groups are working and this limitation is incoming from AWS.
AWS node groups cannot change instance type after creation. Common scenario to workaround this is just create new Node group, relocate workload and remove legacy node group.

timothyclarke · 2021-08-26T09:04:36Z

@daroga0002 Really ? Under the hood EKS is just implementing EC2 autoscaling groups and you can change their types. The only thing about them you cannot change easily is the primary instance type (first one in the list). If you're changing the 2nd or latter instance then there is no problem. We're doing https://docs.aws.amazon.com/cli/latest/reference/autoscaling/update-auto-scaling-group.html after all.

Note on the common sense. You're suggesting 2 terraform applies and a few kubectl commands that will have an impact on a statefulset (replicasets are assumed to be stateless so no impact there).

1st TF run creates a new node group
kubectl cordon
kubectl scale up / delete
kubectl scale down and delete
2nd TF run to remove old node group

Not exactly friendly esp. If I can either use AWS cli or web console to change the ASG

daroga0002 · 2021-08-26T09:10:45Z

Maybe not firendly but this is AWS limitation in node groups. Node groups are AWS managed solution which has own limitation, if you want make this more dynamic then you can always use a worker groups (where you just creating own managed autoscaling groups).

If you dont belive me I just encourage you to check Node groups in EKS cluster UI:

Where you can modify only those:

You can check also https://github.com/aws/containers-roadmap maybe this will be changed (or you can open them a feature request if not already existing)

timothyclarke · 2021-08-26T09:44:15Z

@daroga0002 your first screen shot has cut off the pertinent part in the Details box : Autoscaling group name. That takes you to the EC2 ASG where you can edit as I was describing

daroga0002 · 2021-08-26T09:49:16Z

yup, but this is not supported by AWS and is a hack. Node groups are created using resource https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group which is using EKS API endpoint https://docs.aws.amazon.com/eks/latest/APIReference/API_UpdateNodegroupConfig.html

To clarify more node group api endpoint is creating under the hood autoscaling group and other EC2 resources so in result terraform using node groups is even not using EC2 API (which is responsible for autoscaling groups)

damienleger · 2021-09-07T08:57:34Z

I don't understand why this bug is closed. And the whole argument like it's a normal behaviour and a limitation of the AWS API doesn't make sense to me.

The node group name must be unique. Because of this, the create_before_destroy = true , the module tries to create a node group name with the same name before delete it then it fails and that's it. Without the create_before_destroy = true it will work, there is no such thing like AWS limitation here!

I dig the history. In the first version of file, there was create_before_destroy = true but the node group name was unique because generated by a random_pet suffix to avoid name collision -->

terraform-aws-eks/modules/node_groups/node_groups.tf

Line 4 in 11147e9

    
           node_group_name = join("-", [var.cluster_name, each.key, random_pet.node_groups[each.key].id])

In this PR #1372 the random_pet suffix was removed hence the node group name is not unique on re-creation, and since create_before_destroy = true has not been removed there is now this name collision bug. Once again regression, not AWS limitation.

Btw from https://www.terraform.io/docs/language/meta-arguments/lifecycle.html#create_before_destroy

This is an opt-in behavior because many remote object types have unique name requirements or other constraints that must be accommodated for both a new and an old object to exist concurrently. Some resource types offer special options to append a random suffix onto each object name to avoid collisions, for example. Terraform CLI cannot automatically activate such features, so you must understand the constraints for each resource type before using create_before_destroy with it.

daroga0002 · 2021-09-07T09:08:53Z

let me analyze this again

damienleger · 2021-09-07T09:19:16Z

Digging a bit more I found this https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/upgrades.md#upgrade-module-to-v1700-for-managed-node-groups

After the first apply, we recommand you to create a new node group and let the module use the node_group_name_prefix (by removing the name argument) to generate names and avoid collision during node groups re-creation if needed, because the lifce cycle is create_before_destroy = true.

I haven't try it (in my next upgrade I will) but I think there might be no name collision if using name_prefix instead of name -->

terraform-aws-eks/modules/node_groups/locals.tf

Line 35 in a9b8455

node_groups_names = { for k, v in local.node_groups_expanded : k => lookup(

If my understanding is correct, when using prefix_name, the node group are suffixed by an increment to avoid name collision.

daroga0002 · 2021-09-07T09:29:38Z

I checked and in general lifecycle hook is there to avoid a downtime for workload when you modify node groups which are using name_prefix which is default behaviour when you dont pass name` input into node_group definition.

Unfortunately lifecycle meta arguments cannot be conditional in terraform what is causing that we must choose which issue from two is acceptable:

issue described in this thread
causing downtime for workload running on node group each time even if there is use of name_prefix used.

From my side I think we must accept issue 1, as issue 2 will make this module will be not allowing for rolling updates at all.

daroga0002 · 2021-09-07T09:30:37Z

@damienleger thank you for putting attention on this 🥇

I will work later on PR which will add to docs limitations regarding using fixed name in node groups

damienleger · 2021-09-07T09:51:23Z

Thank you @daroga0002 for the quick look. Yes I agree with you using name_prefix is the safest solution. So if I'm correct we can use (1) name, (2) name_prefix or neither of both then it (3) fallbacks to name_prefix logic. If I may, can you add in the doc the difference between (2) and (3) also please? From the tf code I must admit it's not clear to me 😕

stale · 2021-10-07T11:04:16Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2021-10-14T18:21:49Z

This issue has been automatically closed because it has not had recent activity since being marked as stale.

github-actions · 2022-11-17T02:27:33Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

antonbabenko closed this as completed Aug 26, 2021

daroga0002 mentioned this issue Aug 26, 2021

v17 completely broken for managed node groups updates #1495

Closed

daroga0002 mentioned this issue Aug 30, 2021

mixed_instances_policy for managed node groups #1547

Closed

daroga0002 reopened this Sep 7, 2021

daroga0002 added the documentation label Sep 7, 2021

daroga0002 self-assigned this Sep 7, 2021

stale bot added the stale label Oct 7, 2021

stale bot closed this as completed Oct 14, 2021

github-actions bot locked as resolved and limited conversation to collaborators Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Lifecycle tags in node_group prevents nodegroup from being changed #1525

[BUG] Lifecycle tags in node_group prevents nodegroup from being changed #1525

timothyclarke commented Aug 11, 2021

selfieblue commented Aug 15, 2021

daroga0002 commented Aug 26, 2021

timothyclarke commented Aug 26, 2021 •

edited

Loading

daroga0002 commented Aug 26, 2021 •

edited

Loading

timothyclarke commented Aug 26, 2021 •

edited

Loading

daroga0002 commented Aug 26, 2021 •

edited

Loading

damienleger commented Sep 7, 2021 •

edited

Loading

daroga0002 commented Sep 7, 2021

damienleger commented Sep 7, 2021

daroga0002 commented Sep 7, 2021

daroga0002 commented Sep 7, 2021 •

edited

Loading

damienleger commented Sep 7, 2021 •

edited

Loading

stale bot commented Oct 7, 2021

stale bot commented Oct 14, 2021

github-actions bot commented Nov 17, 2022

[BUG] Lifecycle tags in node_group prevents nodegroup from being changed #1525

[BUG] Lifecycle tags in node_group prevents nodegroup from being changed #1525

Comments

timothyclarke commented Aug 11, 2021

Description

⚠️ Note

Versions

Reproduction

Code Snippet to Reproduce

Expected behavior

Actual behavior

selfieblue commented Aug 15, 2021

daroga0002 commented Aug 26, 2021

timothyclarke commented Aug 26, 2021 • edited Loading

daroga0002 commented Aug 26, 2021 • edited Loading

timothyclarke commented Aug 26, 2021 • edited Loading

daroga0002 commented Aug 26, 2021 • edited Loading

damienleger commented Sep 7, 2021 • edited Loading

daroga0002 commented Sep 7, 2021

damienleger commented Sep 7, 2021

daroga0002 commented Sep 7, 2021

daroga0002 commented Sep 7, 2021 • edited Loading

damienleger commented Sep 7, 2021 • edited Loading

stale bot commented Oct 7, 2021

stale bot commented Oct 14, 2021

github-actions bot commented Nov 17, 2022

timothyclarke commented Aug 26, 2021 •

edited

Loading

daroga0002 commented Aug 26, 2021 •

edited

Loading

timothyclarke commented Aug 26, 2021 •

edited

Loading

daroga0002 commented Aug 26, 2021 •

edited

Loading

damienleger commented Sep 7, 2021 •

edited

Loading

daroga0002 commented Sep 7, 2021 •

edited

Loading

damienleger commented Sep 7, 2021 •

edited

Loading