Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Windows platform worker group does not find an AMI #1411

Conversation

brettcave
Copy link

@brettcave brettcave commented May 28, 2021

Fixes #1410 - windows platform in workers_group_default results in a lookup for a windows node in the linux map. A windows AMI should be refereced to the windows map, and the default map should look for Linux.

PR o'clock

Description

Set a windows platform on the worker groups fails, unable to set an AMI. This is due to a windows lookup from a linux map.

Checklist

…es not find an AMI, because the local will never resolve a windows AMI from the linux platform.
@brettcave brettcave changed the title Fixes #1410 - A Windows platform worker group does not find an AMI, b… fix: Windows platform worker group does not find an AMI May 28, 2021
@barryib
Copy link
Member

barryib commented May 28, 2021

I think you're changing the default behavior of the module. The default platform is Linux. To deploy windows box, you have set platform=windows in the corresponding worker group right ?

@barryib
Copy link
Member

barryib commented May 28, 2021

I think I understand your issue now. The PR breaks workloads where uses changed the default plateform value.

Am I correct ?

@brettcave
Copy link
Author

I think I understand your issue now. The PR breaks workloads where uses changed the default plateform value.

Am I correct ?

@barryib - yes. If defaults are used, or the worker group platform is set to "linux", it works. however, if this is overridden to "windows", then it fails, because a windows ami lookup never gives a result, due to looking for a windows AMI in a linux AMI map.

I tested locally: a linux deployment works, and overriding to windows results in:

aws_launch_configuration.workers[0]: Creation complete after 5s [id=REDACTED]

@brettcave
Copy link
Author

@barryib - an example of where this is breaking is that Windows platform EKS deployments with spotinst are not possible - https://github.com/spotinst/terraform-spotinst-ocean-eks. We reference this module which has some version pinning in it (~> 16.0) and our EKS deployments are blocked because of this until this fix is in.

This is the value that is passed through to the module:


variable "workers_group_defaults" {
  type = map

  default = {
    platform             = "windows"
    root_volume_type     = "gp2"
  }
}

@@ -16,7 +16,7 @@ locals {
worker_group_launch_template_count = length(var.worker_groups_launch_template)

worker_has_linux_ami = length([for x in concat(var.worker_groups, var.worker_groups_launch_template) : x if lookup(x, "platform", "linux") == "linux"]) > 0
worker_has_windows_ami = length([for x in concat(var.worker_groups, var.worker_groups_launch_template) : x if lookup(x, "platform", "linux") == "windows"]) > 0
worker_has_windows_ami = length([for x in concat(var.worker_groups, var.worker_groups_launch_template) : x if lookup(x, "platform", "windows") == "windows"]) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we do this, we'll always lookup for windows and linux AMI if someone doesn’t explicitly set platform on worker nodes. Because both worker_has_linux_ami and worker_has_windows_ami will be true.

I don't know how to solve this without introducing cycle. Maybe it worth to add new variables to explicitly ask for lookup for plateform specific AMI. Something like worker_search_ami_linux and worker_search_ami_windows. Both will be set to true by default.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barryib - ah right.

What about changing the default return on the look from lookup(x, "platform", "windows") == "windows" to lookup(x,"platform",local.workers_group_defaults["platform"]) ?

Or instead of using the concat + lookup, maybe the worker_has_windows_ami could be simplified to worker_has_windows_ami = local.workers_group_defaults["platform"] == "windows"? I haven't looked through the module enough to understand the logic yet, i may be overlooking the module's ability to provide worker groups on multiple platforms - can you confirm if that's the case, or if it is mutually exclusive? (i.e. only eks is configured with only windows or linux worker groups)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that before, but I went into terraform cycle error.

What do you think about #1413 ? Can we close your PR ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through how local.workers_group_defaults is used / referenced in other places, I think this approach might work. The update would then look like:

worker_has_linux_ami = local.workers_group_defaults["platform"] == "linux"
worker_has_windows_ami = local.workers_group_defaults["platform"] == "windows"

This is defaulted to "linux" by https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/local.tf#L75 and would only look up 1 or the other depending on default vs a var override being provided.

I'll look to start testing this approach, call out anything i'm overlooking here. If this is the case, it might not even be necessary to have this as a local here, it might be a bit simpler to just use this condition in the count = local.workers_group_defaults["platform"] == "windows" here: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/data.tf#L34

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barryib - ok - i see the cycle errors you mentioned. Sure, happy to close this 1 in favour of #1413

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I'll close this. Can you please review #1413 ?

@barryib
Copy link
Member

barryib commented May 28, 2021

Replaced by #1413

@barryib barryib closed this May 28, 2021
@brettcave brettcave deleted the 1410-windows-platform-group-fix branch May 28, 2021 20:12
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows platform will always fail, because "windows" != "linux"
2 participants