Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with the eks-asg-tags.tf #63

Closed
zestrells opened this issue Mar 10, 2022 · 10 comments
Closed

Issue with the eks-asg-tags.tf #63

zestrells opened this issue Mar 10, 2022 · 10 comments

Comments

@zestrells
Copy link

zestrells commented Mar 10, 2022

Hello @ArchiFleKs , looks like I found another issue. when I provide multiple subnet_ids for a managed node group

      subnet_ids              = [dependency.vpc.outputs.private_subnets[0], dependency.vpc.outputs.private_subnets[1], dependency.vpc.outputs.private_subnets[2]]

I get this error with the eks-asg-tags.tf.

│ Error: Invalid function argument
│ 
│   on eks-asg-tags.tf line 44, in resource "null_resource" "node_groups_asg_tags":
│   44:   "Value" : one(data.aws_autoscaling_group.node_groups[each.key].availability_zones),
│     ├────────────────
│     │ data.aws_autoscaling_group.node_groups is object with 4 attributes
│     │ each.key is "gpu"
│ 
│ Invalid value for "list" parameter: must be a list, set, or tuple value
│ with either zero or one elements.
@ArchiFleKs
Copy link
Member

ArchiFleKs commented Mar 10, 2022

Can you post the full terragrunt.hcl ? I have some cluster using subnet_ids without issues.

@zestrells
Copy link
Author

zestrells commented Mar 10, 2022

This is the one that is causing issues now. Let me know if you need anything else 😄 and thank you for your help!

    "gpu" = {
      desired_size            = 1
      ami_type                = "AL2_x86_64_GPU"
      platform                = "linux"
      instance_types          = ["g4dn.2xlarge"]
      subnet_ids              = [dependency.vpc.outputs.private_subnets[0], dependency.vpc.outputs.private_subnets[1], dependency.vpc.outputs.private_subnets[2]]
      pre_bootstrap_user_data = <<-EOT
        #!/bin/bash
        set -ex
        cat <<-EOF > /etc/profile.d/bootstrap.sh
        export CONTAINER_RUNTIME="containerd"
        export USE_MAX_PODS=false
        export KUBELET_EXTRA_ARGS="--max-pods=${run_cmd("/bin/sh", "-c", "../../../../../../../tools/max-pods-calculator.sh --instance-type g4dn.2xlarge --cni-version 1.10.2 --cni-prefix-delegation-enabled")}"
        EOF
        # Source extra environment variables in bootstrap script
        sed -i '/^set -o errexit/a\\nsource /etc/profile.d/bootstrap.sh' /etc/eks/bootstrap.sh
        cd /tmp
        sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
        sudo systemctl enable amazon-ssm-agent
        sudo systemctl start amazon-ssm-agent
        EOT
      taints = [
        {
          key    = "gpuGroup"
          value  = "true"
          effect = "NO_SCHEDULE"
        }
      ]
      labels = {
        network = "private"
        size    = "g4dn.2xlarge"
      }
    }

@zestrells
Copy link
Author

Also issue on line 51

│ Error: Invalid function argument
│ 
│   on eks-asg-tags.tf line 51, in resource "null_resource" "node_groups_asg_tags":
│   51:   "Value" : one(data.aws_autoscaling_group.node_groups[each.key].availability_zones),
│     ├────────────────
│     │ data.aws_autoscaling_group.node_groups is object with 4 attributes
│     │ each.key is "gpu"
│ 
│ Invalid value for "list" parameter: must be a list, set, or tuple value
│ with either zero or one elements.

@zestrells
Copy link
Author

Is the issue here that I should be using It like this?
subnet_ids = dependency.vpc.outputs.private_subnets
vs
subnet_ids = [dependency.vpc.outputs.private_subnets[0], dependency.vpc.outputs.private_subnets[1], dependency.vpc.outputs.private_subnets[2]]

@ArchiFleKs
Copy link
Member

Ok, I see the with the one, I always use a single subnet_ids per managed node group as per so when using multiple subnet it fails because of the one function

@ArchiFleKs
Copy link
Member

The related config here (https://github.com/particuleio/teks/blob/main/terragrunt/snippets/eks-asg-tags/eks-asg-tags.tf#L40) are for ebs and volume scheduling hint when scaling to and/or from 0 with managed node group. They should be remove if needed, but if you are planning to use block storage with EBS I'd suggest you use one node group per availability zone.

Also, eks-asg-tags.tf will soon be replace with Terraform native tags which will be easier to handle your use case (https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group_tag)

@ArchiFleKs
Copy link
Member

Is the issue here that I should be using It like this? subnet_ids = dependency.vpc.outputs.private_subnets vs subnet_ids = [dependency.vpc.outputs.private_subnets[0], dependency.vpc.outputs.private_subnets[1], dependency.vpc.outputs.private_subnets[2]]

It just expects only one or zero argument for the one function, and here it finds 3. it would be the same error with dependency.vpc.outputs.private_subnets

This part should be removed for you used case to function

@zestrells
Copy link
Author

Awesome! I will go with the one subnet_id per node group and let cluster-autoscaler do its job with balancing! Thank you for the help again! 😊

@bogdando
Copy link

bogdando commented Jul 15, 2022

The aws_autoscaling_group_tag resource to manage Terraform native tags has yet integrated into terraform-aws-eks module for managed NGs, it seems. It relies on the module outputs module.eks.eks_managed_node_groups.node_group_labels(_taints). While its autoscaling_group_tags input is only for unmanaged node groups (and I doubt it will ever be accepted for EKS managed NGs). So the tagging snippet should stay for a while.

Regarding the way it picks availability_zones to apply as ASG tags, one() picks a first value. This may be not that user expects, when there are managed node groups each mapped per an AZ, like arm-a, arm-b, arm-c. I would expect its zone ASG tags should NOT all become eu-west-1a, for example, but based on the AZ -a, -b, -c prefixes instead.

I've attempted to fix that in my tEKS fork [0], [1] (also I think that's the reference to data.aws_autoscaling_group.node_groups[each.key].availability_zones fails there, while var.eks_managed_node_groups[each.key].availability_zones works for me)

This approach shows for an ASG all subnets and all AZs, but only a properly mapped one for zone tags. Not certain if that 's correct, or there should be multiple zone tags for each AZ?

[0] https://github.com/bogdando/teks/commit/e3564dd53f2a6936c001a0601a861fd8aa2d8a77#diff-0016c55be15bfa68fbb22254752f8a7fe56b71490e678123877d583a5346779cR203
[1] https://github.com/bogdando/teks/commit/e3564dd53f2a6936c001a0601a861fd8aa2d8a77#diff-450049c636a646a657acdb75322f1eddf9c54e6adc17ff676b9b59141bc85bdaR63

@ArchiFleKs
Copy link
Member

ArchiFleKs commented Jul 16, 2022

Hi, This commit should solve your issues and is more elegant: 63da5a3

It will work regardless if you are doing 1 asg per AZ or multiple AZ and add the proper tags or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants