Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvement: Use time_sleep instead of local-exec #1253

Closed
wants to merge 2 commits into from
Closed

improvement: Use time_sleep instead of local-exec #1253

wants to merge 2 commits into from

Conversation

shoekstra
Copy link

@shoekstra shoekstra commented Feb 23, 2021

PR o'clock

Description

Using time_sleep is a more platform agnostic way to wait for something than using a local-exec. As the
previous way had a 5m timeout, the same is implemented using time_sleep.

With this change we're able to provision EKS clusters and manage the aws_auth config map while still using the hashicorp/tfc-agent docker image.

Checklist

@daroga0002
Copy link
Contributor

Issue which I see here is that AWS doesn't guarantee Control planes availability. This means this can take 2 minutes, but can take also 15 minutes.

In your PR you assuming that this will take max 5 minutes, in result this will cause that module can fail even if everything is going as should. From other side increasing time_sleep to 10 minutes can cause non required delay in module execution.

Just to be clear I dont like this local-exec hack but I dont see currently any other solution to solve it in user friendly way without causing more issues.

@shoekstra
Copy link
Author

Hi @daroga0002 !

Issue which I see here is that AWS doesn't guarantee Control planes availability. This means this can take 2 minutes, but can take also 15 minutes.

In your PR you assuming that this will take max 5 minutes, in result this will cause that module can fail even if everything is going as should. From other side increasing time_sleep to 10 minutes can cause non required delay in module execution.

I can understand the hesitation, it's certainly not the ideal situation... Re max 5 minutes, this is already the case when you look at var.wait_for_cluster_cmd, so no change there. The only perk to the existing method is if it's ready within 5 minutes the creation of the rest of the resources happens sooner.

Just to be clear I dont like this local-exec hack but I dont see currently any other solution to solve it in user friendly way without causing more issues.

I can appreciate you can't take my word for it, but I've deployed a few clusters and they've regularly been ready when the 5m timer is reached...

Due to using Terraform Cloud to bootstrap our EKS clusters, we're confined to using the hashicorp/tfc-agent image (I assume it's this anyway) which doesn't have the wget or curl binaries installed. If the concern is the 5 minute timer, I'd be more in favour with increasing this to a higher number (or making it configurable) and doing this "natively" - waiting a few more minutes to stand up an EKS cluster wouldn't be the end of the world.

Look forward to your feedback!

Stephen

@daroga0002
Copy link
Contributor

Yup, I understand your arguments. So I think they are reasonable so from my side 👍

Some technical think is that you must add Semantric pull Request prefix into PR subject choosing one from:

types:
- feat
- fix
- improvement
- docs
- refactor
- test
- ci
- chore

I think it should be:
improvement: Use time_sleep instead of local-exec

@shoekstra shoekstra changed the title Use time_sleep instead of local-exec improvement : Use time_sleep instead of local-exec Feb 26, 2021
@shoekstra shoekstra changed the title improvement : Use time_sleep instead of local-exec improvement: Use time_sleep instead of local-exec Feb 26, 2021
@shoekstra
Copy link
Author

Great!

I've updated the title as requested, will look at the docs lint job later.

@daroga0002
Copy link
Contributor

I am adding this link https://aws.amazon.com/about-aws/whats-new/2021/03/amazon-eks-reduces-cluster-creation-time-40-percent/ as this is also good news in this context.

@barryib can you review this?

Using time_sleep is a more platform agnostic way to wait for something than using a local-exec. As the
previous way had a 5m timeout, the same is implemented using time_sleep.

Signed-off-by: Stephen Hoekstra <[email protected]>
Signed-off-by: Stephen Hoekstra <[email protected]>
@shoekstra
Copy link
Author

Hey @daroga0002, I've rebased this PR and regenerated the docs using pre-commit run -a.

Is there anything still needed to get this merged and released?

@daroga0002
Copy link
Contributor

Is there anything still needed to get this merged and released?

For me it looks good, but I am not maintainer of this repo so I cannot approve :(

@barryib

@shoekstra
Copy link
Author

Hi @max-rocket-internet @barryib!

Are either of you able to take a look?

Thanks in advance.

@barryib
Copy link
Member

barryib commented May 4, 2021

Hello @shoekstra Thanks for opening this PR and @daroga0002 for all your reviews (really appreciated).

I truly understand your concern here. The first time we introduced this check with curl (or wget as fallback), we knew it was not great solution. But as @daroga0002 mentioned it, the EKS availability is quite random and waiting 5mn, will slow down every EKS cluster creation by 5mn.

There is a issue on AWS side to track this aws/containers-roadmap#654
And here is a try to resolve this in the AWS provider hashicorp/terraform-provider-aws#11426

Since we now publish our own provider with terraform registry, I had in mind to publish the http provider with the support of http timeout hashicorp/terraform-provider-http#71. This fork will be used only for this timeout feature until we go a real solution.

@barryib
Copy link
Member

barryib commented May 5, 2021

FIW, I started this #1339 for tests.

@barryib
Copy link
Member

barryib commented May 17, 2021

@shoekstra thank you for your work. I'm closing your PR, since we merged #1339. I assume that PR addressed your issue. Feel free to reopen it if needed.

Thanks @daroga0002 for your help.

@barryib barryib closed this May 17, 2021
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants