Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launching a new EC2 instance. Status Reason: Could not launch Spot Instances. InvalidParameterValue - You cannot specify tags for elastic GPUs if there are no elastic GPUs being created by the request. Launching EC2 instance failed. #2355

Closed
1 task done
jurgen-weber-deltatre opened this issue Dec 19, 2022 · 4 comments · Fixed by #2360

Comments

@jurgen-weber-deltatre
Copy link

Description

After taking the latest v19 patch and applying the tag specifications all of our self managed ASG's started failing to scale with the error:

"Launching a new EC2 instance. Status Reason: Could not launch Spot Instances. InvalidParameterValue - You cannot specify tags for elastic GPUs if there are no elastic GPUs being created by the request. Launching EC2 instance failed."

I assume it is related to the following PR in 19.2; https://github.com/terraform-aws-modules/terraform-aws-eks/pull/2352/files

  • ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]:
    v19.3.1
  • Terraform version:
    Terraform v1.3.6
  • Provider version(s):
    "4.47.0"

Reproduction Code [Required]

Steps to reproduce the behavior:

We just upgraded to v19.3.1

Expected behavior

My ASG's can launch instances

Actual behavior

My ASG's fail to launch instances with the error "Launching a new EC2 instance. Status Reason: Could not launch Spot Instances. InvalidParameterValue - You cannot specify tags for elastic GPUs if there are no elastic GPUs being created by the request. Launching EC2 instance failed."

Terminal Output Screenshot(s)

"Launching a new EC2 instance. Status Reason: Could not launch Spot Instances. InvalidParameterValue - You cannot specify tags for elastic GPUs if there are no elastic GPUs being created by the request. Launching EC2 instance failed."

Additional context

Here is a config terraform code snippet our self managed nodes config

    for subnet in try(module.vpc.private_subnets, []): [
      for size, type in local.worker_groups_size_cpu_indexed: {

        name                            = "cpu-spot-${size}-${subnet}"
        use_name_prefix                 = false
        iam_role_use_name_prefix        = false

        ami_id                          = data.aws_ssm_parameter.bottlerocket_ami_id.value

        min_size                        = local.config_tier["asg_min_size_cpu"][var.config_tier]
        max_size                        = local.config_tier["asg_max_size_cpu"][var.config_tier]
        desired_size                    = local.config_tier["asg_min_size_cpu"][var.config_tier]

        block_device_mappings           = [
          {
            device_name                 = "/dev/xvda"
            ebs                         = {
              delete_on_termination     = true
              encrypted                 = true
              throughput                = 150
              volume_size               = 52
              volume_type               = "gp3"
            }
          },
          {
            device_name                 = "/dev/xvdb"
            ebs                         = {
              delete_on_termination     = true
              encrypted                 = true
              throughput                = 150
              volume_size               = 76
              volume_type               = "gp3"
            }
          }
        ]

        bootstrap_extra_args            = <<-EOT
        ${local.bottlerocket_userdata}
        [settings.kubernetes.node-labels]
        ingress = "allowed"
        "asg.deltatre.com/cluster-autoscaler"  = "true" # https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#auto-discovery-setup, https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html
        "node.kubernetes.io/lifecycle"  = "normal"
        "node.kubernetes.io/type"       = "cpu"
        EOT

        ebs_optimized                   = true
        enabled_metrics                 = [
          "GroupMinSize",
          "GroupMaxSize",
          "GroupDesiredCapacity",
          "GroupInServiceInstances",
          "GroupPendingInstances",
          "GroupStandbyInstances",
          "GroupTerminatingInstances",
          "GroupTotalInstances"
        ]

        initial_lifecycle_hooks         = [
          {
            name                        = "node-termination-handler"
            default_result              = "CONTINUE"
            heartbeat_timeout           = "300"
            lifecycle_transition        = "autoscaling:EC2_INSTANCE_TERMINATING"
          }
        ]

        instance_refresh                = {
          preferences                   = {
            min_healthy_percentage      = var.config_tier == "prod" ? 90 : 66
          }
          strategy                      = "Rolling"
        }

        instance_type                   = element(type, 0)

        mixed_instances_policy          = {
          instances_distribution        = {
            # https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group#spot_allocation_strategy
            spot_allocation_strategy    = var.config_tier == "prod" ? "capacity-optimized" : "lowest-price"
            spot_instance_pools         = var.config_tier == "prod" ? 0 : 20
          }
          override                      = flatten([
            for k, v in type: {
              instance_type             = v
            }
          ])
        }
        platform                        = "bottlerocket"
        pre_bootstrap_user_data         = local.pre_bootstrap_user_data
        protect_from_scale_in           = false
        use_mixed_instances_policy      = true
        subnet_ids                      = [
          subnet
        ]
        suspended_processes             = [
          "AZRebalance"
        ]
        autoscaling_group_tags          = { # just on the autocaling group, do not get propagated
          "k8s.io/cluster-autoscaler/enabled" : true,
          "k8s.io/cluster-autoscaler/${local.cluster_name}" : true,
          "k8s.io/cluster-autoscaler/node-template/label/asg.deltatre.com/cluster-autoscaler" : true # https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#auto-discovery-setup, https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html
        }
        tags                            = { # this gets propegated to the ASG launched instances
          "aws-node-termination-handler/managed" = "true"
        }
        vpc_security_group_ids          = [
          module.vpc.default_security_group_id
        ]
      } if local.enable_eks
    ]
  ])```
@demigoldberg
Copy link

creating a new cluster error:
Failed: Could not launch Spot Instances. InvalidParameterValue - You cannot specify tags for elastic GPUs if there are no elastic GPUs being created by the request

@antonbabenko
Copy link
Member

This issue has been resolved in version 19.4.0 🎉

@jurgen-weber-deltatre
Copy link
Author

Thank you for the quick resolution

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants