-
-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ASG lifecycle management Lambda function #392
Conversation
What a nice feature! I need this too. I deployed a new runner into my development environment and noticed the following:
There is a "build" directory but it does not contain the "zip.tmp" but the "zip" file instead. Seems that it has been moved. |
Ok, got it running with
|
I expect one additional parameter only: |
Thanks for the review, @kayman-mk! I'll work on fixes and adjustments from your feedback. The Python requirement comes from the use of an external module (https://registry.terraform.io/modules/terraform-aws-modules/lambda/) for the Lambda function and is mentioned as a prerequisite when using this feature in the documentation updates in this pull request. However, I think we can get rid of this and handle this simple use case in this module. I'll do some refactoring for this. On the issue of it not actually doing anything - I'm curious if you can check the CloudWatch event rule to see if it triggered an invocation and if it failed? There's also a CloudWatch log stream for the function itself. On the worker instance, do you see the |
Nope, I do not see the tag. I guess it's the missing comma in the |
I've made adjustments from @kayman-mk's review, including removing the use of an external module and thus eliminating the 'python/pip' dependencies. |
@joshbeard will check asap, please can you rebase? |
@npalm rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshbeard thanks for your contribution. Made some small remakrs, please can youhave a look. I also need todo a bit more testing.
@npalm Thank you so much for the thorough review and consideration! I will resolve the issues you pointed out, rebase, and push within the next couple of days. |
Couple of days = couple of weeks ;) I've been away for the past couple of weeks. |
@npalm @kayman-mk I've finally gotten back around to this and have made updates based on your most recent feedback in commit a290efa. I noticed the I'll rebase this after follow-up reviews and potential tweaks. |
Great work, added to my backlog. |
Will check in the weekend |
@joshbeard @npalm I did a quick check in my environment.
What I noticed:
|
@kayman-mk I also will do a check. Would suggest to make the lambda memory as well the time out configurable with a reasonable default |
@npalm I've made the lambda memory size and the timeout configurable: https://github.com/npalm/terraform-aws-gitlab-runner/pull/392/files#diff-05b5a57c136b6ff596500bcbfdcff145ef6cddea2a0e86d184d9daa9a65a288e |
Rebased against "develop" |
Really looking forward to having this merged, because this was one of the "issues" I documented before we could look into using this module at work. Do you have any idea when this can be merged @npalm? (Thank you for the awesome work!) |
@GertVil I will try to check the PR tonight. Can you provide a short direction how to test the lambda? |
I'm afraid that @joshbeard will probably be in a better position to provide you with that information since he created all this so he probably has a scenario that he has used before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshbeard great job! works like a charm! Only one small change request, see inline. Can you also run a rebase please.
@npalm Adjustments made and rebased against 'develop' |
Sorry, missed the update. Will merge the PR tomorrow. If you have time for a rebase, please. Othewrise I will fix it locally! |
This introduces an Auto Scaling Group instance termination lifecycle hook using Lambda and related resources. The Lambda function is a Python script that is triggered when the persistent runner instance in the ASG is terminated. The function receives the instance ID of the "parent" runner and queries for spawned instances that it launched to terminate. Additionally, it will check for other "orphaned" instances that have a `gitlab-runner-parent-id` tag that doesn't match an existing instance. This resolves the issue where spawned instances could be orphaned when their parent runner is terminated. This feature is disabled by default. The user data script is updated to provide the 'parent' instance ID as a tag named 'gitlab-runner-parent-id' on spawned instances. A new sub-module is provided called "terminate-workers". It is optional to use this feature, and the input variable `asg_terminate_lifecycle_hook_create` can be toggled `true` or `false` for this behavior.
Rebased again. The README changed a bit since my last rebase, as I'm sure you're aware. I added the resources to the "resources" section, but that may get overwritten with the current pre-commit config. |
@joshbeard thx, checking. Just a qucik tip for a next time. Fore reviewing it helps if you are not squashing, only rebase. Now I have to go over the full PR again. No worries we get this merged! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshbeard All good, thanks!!!
I'm sorry about that. Thanks for the review. |
Great! Thanks for your work on this, the review, feedback, and including this in the project! |
🎉 This PR is included in version 4.40.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This introduces an Auto Scaling Group instance termination lifecycle hook using Lambda and related resources. The Lambda function is a Python script that is triggered when the persistent runner instance in the ASG is terminated. The function receives the instance ID of the "parent" runner and queries for spawned instances that it launched to terminate. Additionally, it will check for other "orphaned" instances that have a `gitlab-runner-parent-id` tag that doesn't match an existing instance. This resolves the issue where spawned instances could be orphaned when their parent runner is terminated. This feature is disabled by default. The user data script is updated to provide the 'parent' instance ID as a tag named 'gitlab-runner-parent-id' on spawned instances. A new sub-module is provided called "terminate-workers". It is optional to use this feature, and the input variable `asg_terminate_lifecycle_hook_create` can be toggled `true` or `false` for this behavior.
This introduces an Auto Scaling Group instance termination lifecycle hook using Lambda and related resources. The Lambda function is a Python script that is triggered when the persistent runner instance in the ASG is terminated. The function receives the instance ID of the "parent" runner and queries for spawned instances that it launched to terminate. Additionally, it will check for other "orphaned" instances that have a `gitlab-runner-parent-id` tag that doesn't match an existing instance. This resolves the issue where spawned instances could be orphaned when their parent runner is terminated. This feature is disabled by default. The user data script is updated to provide the 'parent' instance ID as a tag named 'gitlab-runner-parent-id' on spawned instances. A new sub-module is provided called "terminate-workers". It is optional to use this feature, and the input variable `asg_terminate_lifecycle_hook_create` can be toggled `true` or `false` for this behavior.
This introduces an Auto Scaling Group instance termination lifecycle hook using Lambda and related resources. The Lambda function is a Python script that is triggered when the persistent runner instance in the ASG is terminated. The function receives the instance ID of the "parent" runner and queries for spawned instances that it launched to terminate. Additionally, it will check for other "orphaned" instances that have a `gitlab-runner-parent-id` tag that doesn't match an existing instance. This resolves the issue where spawned instances could be orphaned when their parent runner is terminated. This feature is disabled by default. The user data script is updated to provide the 'parent' instance ID as a tag named 'gitlab-runner-parent-id' on spawned instances. A new sub-module is provided called "terminate-workers". It is optional to use this feature, and the input variable `asg_terminate_lifecycle_hook_create` can be toggled `true` or `false` for this behavior.
Description
This introduces an Auto Scaling Group instance termination lifecycle
hook using Lambda and related resources. The Lambda function is a Python
script that is triggered when the persistent runner instance in the ASG
is terminated. The function receives the instance ID of the "parent"
runner and queries for spawned instances that it launched to terminate.
Additionally, it will check for other "orphaned" instances that have a
gitlab-runner-parent-id
tag that doesn't match an existing instance. Thisresolves the issue where spawned instances could be orphaned when their
parent runner is terminated.
This feature is disabled by default.
The user data script is updated to provide the 'parent' instance ID as a
tag named 'gitlab-runner-parent-id' on spawned instances.
A new sub-module is provided called "terminate-workers". It is optional
to use this feature, and the input variable
asg_terminate_lifecycle_hook_create
can be toggledtrue
orfalse
for this behavior.
This partially addresses concerns discussed in issue #214
Migrations required
NO
Verification
I've tested existing standard configuration with this addition disabled and enabled with expected results.
On a default setup without enabling this feature, there are 2 changes - the user data is updated to provide the
gitlab-runner
andgitlab-runner-parent
tags to spawned worker instances.When enabled with its default configuration, there are 10 resource additions and the two changes mentioned previously.
Documentation
We use pre-commit to update the Terraform inputs and outputs in the documentation via terraform-docs. Ensure you have installed those components.
Documentation was updated, tflint and pre-commit hooks ran.