-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
desired capacity update does not work for node groups #835
Comments
it was caused by #691 it works in |
@karlderkaefer |
|
I'm encountering this issue in v11.1.0 as well. I see the significance of #691 however the present issue prevents node group resizing as the author points out. |
TLDR; you should be using the cluster-autoscaler. If not, you need to make the change manually. |
Consider the case where autoscaling is not desired but still want to resize my node group. And we do not wish to resize manually through the console, for the usual reasons. I don't believe this scenario is uncommon. Also to be clear, I don't believe desired should be modified by this module by default as this could cause confusion and undesirable consequences. I am not arguing against #691, however there should be a way to override this behaviour. |
Sure but the problem is that there is no way to have optional |
@max-rocket-internet - Did you mean configure cluster-autoscaler within this module? If so, how? I dont see it in examples. Also, I provisioned my node group with the following value
At what point does the capacity go beyond 4? I tried to standup a t2.micro instance and try to scale pods beyond t2.micro's capacity. My cluster does not scale up to have more nodes. PS: I am using V 12.2.0 of this module. |
Did anyone here get the nodes to scale properly? I cannot get it to work no matter what I do. |
@dmanchikalapudi cluster-autoscaler is not connected to nor can be configured through this module.
The desired_capacity value is ignored by the module. You have to modify it by hand through the console. |
Thanks for the response @kuritonasu. Doing it by hand pretty much negates the idea behind "managed" nodegroups. There is no point in defining the min/max node counts either. It is just an illusion of autoscaling. My need is simple. When my replicasets scale to initialize more pods than the nodes can run with, I need the nodes to scale to accommodate (assuming there is HW capacity underneath and is within the max node count). How do I go about making that happen via terraform? |
Correct ✅
Perhaps his doc might help you to see what is "managed" and what is not, specifically this image:
I wouldn't say it's an illusion, it's just not a "turn-key" thing. ASGs have been around for years and work very well when configured correctly 🙂
This is how typical autoscaling works in k8s but this module is only for the AWS resources. The cluster-autoscaler runs in your cluster and is not supported by us or this module in any way, it's a completely separate thing. But there is some doc here that might help you: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/autoscaling.md |
@max-rocket-internet I'd say, that "managed" in the broader sense together with Terraform also means scaling the worker nodes by setting the desired size as I'm also managing the VPC configuration with Terraform (and everything in between actually :) ) So IMHO, IF setting the desired size is possible through the API it SHOULD be supported by this ressource. |
The practical use case that I have for this is that if I set a managed node group to desired 3, max 6, min 3, cluster autoscaler will respect this. There isn't a technical reason why we can't change the min_size, nor should it be dismissed as "not a feature" So some concrete examples, since this has been a bit of noisy thread. Here's an example initial definition of a scaling config as passed through
Then update this nodegroup's minimum to:
You'll get an error like:
Then running a
So this means that from a terraform perspective There's ways to work around this, such as getting the ASG id from the module and modifying it in terraform as part of the workflow, but that's a hack at best. |
One hacky workaround that I have found works is you can specify a different instance size which will then force a totally new node group to be created which will then respect your (new, "initial") I agree with many of the other thread comments, it really feels odd that desired_capacity is not actually mutable by terraform. That said I do not have a clear picture of what the aws interface is like - I'm sure it's easier said than done! |
I hacked it for now by using the value for the desired capacity in place of minimum capacity. At least if that's not a problem for your design, it works. worker_groups = [
] |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue is still relevant and needs fixing/review. We shouldn't need to create a new node group to change the desired size. |
This issue has been automatically closed because it has not had recent activity since being marked as stale. |
This is still an issue for our team. Terraform should be able to handle this. |
please re open this. this is a major issue. |
Re-opening this to let us track this issue. But so far we don't have a ideal fix for now. Maybe hashicorp/terraform#24188 worth something. |
Looking for its resolution. |
Same, this issue affects my use case as well. |
It is also problem for me |
ditto. Need some kind of solution for this. |
I fixed it by adding the following field... but is it right? Before node_groups = {
main = {
desired_capacity = 1
max_capacity = 1
min_capacity = 1
instance_type = "t2.small"
subnets = module.vpc.private_subnets
} After node_groups = {
main = {
// desired_capacity = 1
desired_size = 1
max_capacity = 1
min_capacity = 1
instance_type = "t2.small"
subnets = module.vpc.private_subnets
} |
I'm not sure how that would work? The lifecycle for the node_group ignores changes to |
Thank you for replying, hmmm, you're right. |
Would it be possible to only have the desired capacity in the lifecycle rule if autoscaling is disabled? |
We're also running into problems with this one. |
The problem with desired_size = each.value["desired_capacity"] is that if your node group auto scaling has scaled out then on a subsequent run of Terraform apply it will set the desired back to whatever is in your code. The problem is that the desired_capacity is required on create of the node group. You can then comment out desired_capacity or change the value to what's currently in the group, however that's a pain. Since if you need to recreate the node group you need to put desired_capacity back in. Additionally, most updates to existing node groups with the EKS module in general fail. They do not trigger an in place update as they should. Instead triggering a replacement, which then fails cause it says the node group name already exists. Only way to fix that is to create a new node group with a different name. |
For me it was
which ignores the changes in desired_size causing the issue so I commented it out and it worked and uncomment it if you want the feature back. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity since being marked as stale. |
still an issue |
please reopen |
this is duplicated in #1568 |
This is really tiresome and while the docs have slightly improved with the FAQ change I still feel like this module is useless for manual changes to EKS. I ended up doing it with the console: Because going from:
to
produces If you do the change manually and to only the
Proposal: can't we add something like In the meantime I did scale to 2 nodes and my tfstate is more or less correct (I have to swallow down the wrong |
Still an issue, anything new on this? |
This is not threaten as issue as this is working as expected and this is some compromise which we implemented in this module and no plans to change it (as this will impact for exmaple autoscaling) |
We could maybe create two separate resources Example: variable "use_autoscaling" {
description = "Determines whether autoscaling will be used or not"
type = bool
default = true
} resource "aws_eks_node_group" "this" {
count = var.create && !var.use_autoscaling ? 1 : 0
...
lifecycle {
create_before_destroy = true
}
...
} resource "aws_eks_node_group" "this_autoscaling" {
count = var.create && var.use_autoscaling ? 1 : 0
...
lifecycle {
create_before_destroy = true
ignore_changes = [
scaling_config[0].desired_size,
]
}
...
} We should also update variables in outputs.tf file accordingly. I agree it is not the most elegant solution, redundant code and everything but it's the only solution I can think of given that conditional expressions are still not supported within |
This is still a major issue over two years after it was raised |
It is not a major issue, it is a design decision the module has taken. The majority of Kubernetes/EKS users utilize some form of autoscaling and without variable support for |
desired capacity update does not work for node groups
I'm submitting an issue, where I have tried to update min, max, desired variables for node groups. The terraform does shows min and max being changed, however the desired does not updated
What is the current behavior?
terraform code.
output from plan and apply:
Error: error updating EKS Node Group (ce-eks-sbx:ce-eks-sbx-eks_nodegroup-lenient-blowfish) config: InvalidParameterException: Minimum capacity 2 can't be greater than desired size 1
{
ClusterName: "test-eks-sbx",
Message_: "Minimum capacity 2 can't be greater than desired size 1",
NodegroupName: "ce-eks-sbx-eks_nodegroup-lenient-blowfish"
}
i have also tried updating desired capacity through node_groups_defaults
If this is a bug, how to reproduce? Please include a code sample if relevant.
change the min, max and desired capacity
What's the expected behavior?
new scaling policies should take place.
Are you able to fix this problem and submit a PR? Link here if you have already.
No
Environment details
Any other relevant info
The text was updated successfully, but these errors were encountered: