Add support for resource-specific resync periods and default drift remediation period #106

a-hilaly · 2022-12-22T04:08:46Z

Part of: aws-controllers-k8s/community#1367

This patch introduces the ability to specify resource-specific resync periods in
the drift remediation configuration, as well as a default drift remediation period
in the controller configuration. The resync period for each reconciler is
determined by trying to retrieve it from the following sources, in this order:

A resource-specific period specified in the drift remediation configuration.
A resource-specific requeue on success period specified by the resource manager
factory.
The default drift remediation period specified in the controller configuration.
The default resync period defined in the ACK runtime package.

This allows users to customize the drift remediation behavior for different
resources as needed, while still providing a fallback option for resources that do
not have a specific period specified.

Signed-off-by: Amine Hilaly [email protected]

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

a-hilaly · 2022-12-22T04:16:49Z

/hold

jaypipes

Just some naming suggestions inline... otherwise I like it!

pkg/config/config.go

jljaco

This is great work! A few comments, mostly on the contents of the inline documentation.

pkg/config/config.go

pkg/runtime/reconciler.go

pkg/config/config_test.go

RedbackThomson

Please see my comments regarding the duration = 0 edge case. Otherwise nits

pkg/config/config.go

RedbackThomson · 2023-01-18T20:15:14Z

pkg/config/config.go

+	if err != nil {
+		return "", 0, fmt.Errorf("invalid value in flag argument: %v", err)
+	}
+	if resyncSeconds < 0 {


I think resyncSeconds <= 0 should be the check. If there are 0 seconds of reconciliation, it'll be constantly reconciled with no exponential backoff. Maybe we can suggest at least 1 second of wait?

I just checked here, if a used provides 0 seconds, the controller will use the package default value which is 10hours

pkg/runtime/reconciler.go

pkg/config/config.go

…mediation period This commit introduces the ability to specify resource-specific resync periods in the drift remediation configuration, as well as a default drift remediation period in the controller configuration. The resync period for each reconciler is determined by trying to retrieve it from the following sources, in this order: 1. A resource-specific period specified in the drift remediation configuration. 2. A resource-specific requeue on success period specified by the resource manager factory. 3. The default drift remediation period specified in the controller configuration. 4. The default resync period defined in the ACK runtime package. This allows users to customize the drift remediation behavior for different resources as needed, while still providing a fallback option for resources that do not have a specific period specified. Signed-off-by: Amine Hilaly <[email protected]>

a-hilaly · 2023-01-27T12:28:14Z

/unhold

ack-bot · 2023-01-27T14:37:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: A-Hilaly, jljaco

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [A-Hilaly,jljaco]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

surajkota

this is a great feature

surajkota · 2023-02-02T01:33:32Z

pkg/runtime/reconciler.go

+// getResyncPeriod returns the period of the recurring reconciler process which ensures the desired
+// state of custom resources is maintained.
+// It attempts to retrieve the duration from the following sources, in this order:
+// 1. A resource-specific reconciliation resync period specified in the reconciliation resync


why resync configuration map?

can we use annotation on the resource like other existing features e.g.

https://aws-controllers-k8s.github.io/community/docs/user-docs/multi-region-resource-management/

https://aws-controllers-k8s.github.io/community/docs/user-docs/deletion-policy/

I'm not a super fan of resource-level drift remediation control - but happy to hear what other folks think about it.

why resync configuration map?

can we use annotation on the resource like other existing features e.g.

* https://aws-controllers-k8s.github.io/community/docs/user-docs/multi-region-resource-management/ * https://aws-controllers-k8s.github.io/community/docs/user-docs/deletion-policy/

@surajkota for both of those annotations, there is a corresponding controller CLI flag:

runtime/pkg/config/config.go

Lines 111 to 115 in b98d322

flag.StringVar(

&cfg.Region, flagAWSRegion,

envutil.WithDefault(envVarAWSRegion, ""),

"The AWS Region in which the service controller will create its resources",

)

runtime/pkg/config/config.go

Lines 151 to 154 in b98d322

flag.Var(

&cfg.DeletionPolicy, flagDeletionPolicy,

"The default deletion policy for all resources managed by the controller",

)

The CLI flags serve as defaults if the annotation is not present.

Ack on the controller flag, my question is related to highest in order to precedence.

Annotation or configmap, both are giving control to the user to configure the resync period. Since we don't have a common configmap or a CRD to define all controller configurations, my preference is to keep the experience consistent and not introduce another place~~

Hi folks, Nick helped me with clarification that resync configuration map != ConfigMap, so we can resolve this comment with one suggestion, rename the comment to say resource resync CLI flag or something along those lines

I do think even CLI flag is not a great option because changing it requires restarting the controller or reinstalling helm chart but thats a discussion for another time. This is good to get started

see my comment on No 3 here. We should drop it

RedbackThomson · 2023-02-03T04:14:12Z

pkg/config/config.go

@@ -152,6 +159,19 @@ func (cfg *Config) BindFlags() {
 		&cfg.DeletionPolicy, flagDeletionPolicy,
 		"The default deletion policy for all resources managed by the controller",
 	)
+	flag.IntVar(
+		&cfg.ReconcileDefaultResyncSeconds, flagReconcileDefaultResyncSeconds,
+		60,


Woah woah wait we don't want every resource to try and reconcile itself every 60 seconds. That's way too often. I was thinking on the order of every 6 or even 10 hours.

Agree for following reasons:

We risk causing throttling error as number of resources managed increase

There can be side affect of No 1 on inter service communication, e.g. services which use Autoscaling, SageMaker hosting service makes call to autoscaling service and if throttling happens, the SageMaker service will get throttled and impact scale in/out of fleet serving customer traffic

Not all resources need this, e.g. SageMaker training jobs are one time jobs, SageMaker models do not support updates so the only way to introduce drift is by deleting and recreating the entire resource outside of ACK but still not resolvable by reconciling.

My vote is to drop this flag completely and let the service teams use requeue_on_success to configure or let the customer override based on their preference. I strongly suggest starting conservative here

I think we can still support this default resync period and also have a setting that ignores this for one-time style resources like TrainingJob. Call that a future feature request. Managing resync periods for every resource individually would require users to update their configuration every time we add a new resource - this is essentially just a shortcut for all of them.

I actually believe I don't want service teams to set requeue_on_success at all. It's ultimately more important that a customer can set their expectations for requeuing resources, since they know their drift conditions better than any ACK contributor will. Sure some resources should be shorter than 10 hours, but whether it's every 5 hours or every 10 minutes, it'll depend on everyone's specific deployment context.

surajkota · 2023-02-03T04:42:05Z

pkg/config/config.go

@@ -152,6 +159,19 @@ func (cfg *Config) BindFlags() {
 		&cfg.DeletionPolicy, flagDeletionPolicy,
 		"The default deletion policy for all resources managed by the controller",
 	)
+	flag.IntVar(
+		&cfg.ReconcileDefaultResyncSeconds, flagReconcileDefaultResyncSeconds,
+		60,


Agree for following reasons:

We risk causing throttling error as number of resources managed increase

There can be side affect of No 1 on inter service communication, e.g. services which use Autoscaling, SageMaker hosting service makes call to autoscaling service and if throttling happens, the SageMaker service will get throttled and impact scale in/out of fleet serving customer traffic

Not all resources need this, e.g. SageMaker training jobs are one time jobs, SageMaker models do not support updates so the only way to introduce drift is by deleting and recreating the entire resource outside of ACK but still not resolvable by reconciling.

My vote is to drop this flag completely and let the service teams use requeue_on_success to configure or let the customer override based on their preference. I strongly suggest starting conservative here

surajkota · 2023-02-03T04:47:25Z

pkg/runtime/reconciler.go

+// getResyncPeriod returns the period of the recurring reconciler process which ensures the desired
+// state of custom resources is maintained.
+// It attempts to retrieve the duration from the following sources, in this order:
+// 1. A resource-specific reconciliation resync period specified in the reconciliation resync


Hi folks, Nick helped me with clarification that resync configuration map != ConfigMap, so we can resolve this comment with one suggestion, rename the comment to say resource resync CLI flag or something along those lines

I do think even CLI flag is not a great option because changing it requires restarting the controller or reinstalling helm chart but thats a discussion for another time. This is good to get started

see my comment on No 3 here. We should drop it

Signed-off-by: Amine Hilaly <[email protected]>

…on comment Signed-off-by: Amine Hilaly <[email protected]>

…ft remediation period Issue: aws-controllers-k8s/community#1367 Follow-up of aws-controllers-k8s/runtime#106 This patch completes the implementation of support for resource-specific resync periods and a default drift remediation period made in runtime repository. The resync period for each reconciler is determined by trying to retrieve it from the following sources, in this order: 1. A resource-specific period specified in the drift remediation configuration. 2. A resource-specific requeue on success period specified by the resource manager factory. 3. The default drift remediation period specified in the controller configuration. 4. The default resync period defined in the ACK runtime package. This allows users to customize the drift remediation behavior for different resources as needed, while still providing a fallback option for resources that do not have a specific period specified. Signed-off-by: Amine Hilaly <[email protected]>

RedbackThomson

One last change then I'm happy to ship this

RedbackThomson · 2023-02-03T19:39:39Z

pkg/config/config.go

+		&cfg.ReconcileDefaultResyncSeconds, flagReconcileDefaultResyncSeconds,
+		0,
+		"The default duration, in seconds, to wait before resyncing desired state of custom resources. "+
+			"This value is used if no resource-specific override has been specified. Default is 60 seconds.",


Please update this description for the new default

RedbackThomson · 2023-02-03T19:49:57Z

pkg/config/config.go

@@ -163,7 +163,7 @@ func (cfg *Config) BindFlags() {
 		&cfg.ReconcileDefaultResyncSeconds, flagReconcileDefaultResyncSeconds,
 		0,
 		"The default duration, in seconds, to wait before resyncing desired state of custom resources. "+
-			"This value is used if no resource-specific override has been specified. Default is 60 seconds.",
+			"This value is used if no resource-specific override has been specified. Default is 0 seconds.",


Well... the default is the 10 hour fallback. 0 seconds makes it sound like it's constantly reconciling without any pause

surajkota

Lgtm with this comment addressed
Thanks for the changes Amine

#106 (comment)

Signed-off-by: Amine Hilaly <[email protected]>

jaypipes

Let's get this shipped, please. :)

RedbackThomson · 2023-02-06T19:58:42Z

Thanks for the last minute changes @a-hilaly

/lgtm

ack-prow · 2023-02-06T19:58:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: A-Hilaly, jaypipes, jljaco, RedbackThomson, surajkota

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [A-Hilaly,RedbackThomson,jljaco]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…ft remediation period (#386) Issue: aws-controllers-k8s/community#1367 Follow-up of aws-controllers-k8s/runtime#106 This patch completes the implementation of support for resource-specific resync periods and a default drift remediation period made in runtime repository. The resync period for each reconciler is determined by trying to retrieve it from the following sources, in this order: 1. A resource-specific period specified in the drift remediation configuration. 2. A resource-specific requeue on success period specified by the resource manager factory. 3. The default drift remediation period specified in the controller configuration. 4. The default resync period defined in the ACK runtime package. This allows users to customize the drift remediation behavior for different resources as needed, while still providing a fallback option for resources that do not have a specific period specified. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

As discussed in aws-controllers-k8s#106 this patch brings resource name validation to the arguments passed to `--reconcile-resource-resync-seconds`. It also slightly changes the previously implemented `ParseReconcileResourceResyncSeconds` to avoid uncessary validation ops. Signed-off-by: Amine Hilaly <[email protected]>

[fixes aws-controllers-k8s/community#1647] As discussed in #106 this patch brings resource name validation to the arguments passed to `--reconcile-resource-resync-seconds`. It also slightly changes the previously implemented `ParseReconcileResourceResyncSeconds` to avoid uncessary validation ops. Signed-off-by: Amine Hilaly <[email protected]> By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

ack-bot requested review from jaypipes and RedbackThomson December 22, 2022 04:08

ack-bot added the approved label Dec 22, 2022

ack-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 22, 2022

a-hilaly mentioned this pull request Dec 22, 2022

Complete support for resource-specific resync periods and default drift remediation period aws-controllers-k8s/code-generator#386

Merged

jaypipes reviewed Dec 22, 2022

View reviewed changes

pkg/config/config.go Outdated Show resolved Hide resolved

pkg/config/config.go Outdated Show resolved Hide resolved

pkg/config/config.go Outdated Show resolved Hide resolved

jljaco reviewed Dec 28, 2022

View reviewed changes

pkg/config/config.go Outdated Show resolved Hide resolved

jljaco reviewed Dec 28, 2022

View reviewed changes

a-hilaly force-pushed the drift-remediation-flag branch 4 times, most recently from 633110f to ac6bc0f Compare January 5, 2023 00:37

a-hilaly mentioned this pull request Jan 5, 2023

Semantics for destructive operations design proposal aws-controllers-k8s/community#1148

Merged

azpaulp reviewed Jan 5, 2023

View reviewed changes

pkg/runtime/reconciler.go Show resolved Hide resolved

azpaulp reviewed Jan 5, 2023

View reviewed changes

pkg/config/config_test.go Show resolved Hide resolved

RedbackThomson reviewed Jan 18, 2023

View reviewed changes

jaypipes requested changes Jan 20, 2023

View reviewed changes

pkg/config/config.go Show resolved Hide resolved

a-hilaly force-pushed the drift-remediation-flag branch from ac6bc0f to 4663973 Compare January 25, 2023 18:20

a-hilaly force-pushed the drift-remediation-flag branch from 4663973 to 4cac789 Compare January 25, 2023 18:24

a-hilaly mentioned this pull request Jan 27, 2023

Resource validation for --reconcile-resource-resync-seconds arguments aws-controllers-k8s/community#1647

Closed

a-hilaly requested review from jaypipes, jljaco, RedbackThomson and azpaulp and removed request for jaypipes January 27, 2023 12:27

surajkota requested changes Feb 2, 2023

View reviewed changes

RedbackThomson reviewed Feb 3, 2023

View reviewed changes

surajkota requested changes Feb 3, 2023

View reviewed changes

a-hilaly added 2 commits February 3, 2023 18:14

Default reconcile-default-resync-seconds to 0

6656fa4

Signed-off-by: Amine Hilaly <[email protected]>

Clarify more the source of the resyncPeriod in getResyncPeriod functi…

e21c6cd

…on comment Signed-off-by: Amine Hilaly <[email protected]>

RedbackThomson suggested changes Feb 3, 2023

View reviewed changes

RedbackThomson reviewed Feb 3, 2023

View reviewed changes

surajkota approved these changes Feb 3, 2023

View reviewed changes

Change flag description default to 10 hours

27b87db

Signed-off-by: Amine Hilaly <[email protected]>

a-hilaly force-pushed the drift-remediation-flag branch from 21ae777 to 27b87db Compare February 6, 2023 15:58

jaypipes approved these changes Feb 6, 2023

View reviewed changes

ack-prow bot assigned RedbackThomson Feb 6, 2023

ack-prow bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2023

ack-prow bot merged commit 3b2bc34 into aws-controllers-k8s:main Feb 6, 2023

a-hilaly mentioned this pull request Mar 3, 2023

Validate resource names for drift remediation flags #117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for resource-specific resync periods and default drift remediation period #106

Add support for resource-specific resync periods and default drift remediation period #106

a-hilaly commented Dec 22, 2022

a-hilaly commented Dec 22, 2022

jaypipes left a comment

jljaco left a comment

RedbackThomson left a comment

RedbackThomson Jan 18, 2023

a-hilaly Jan 25, 2023

a-hilaly commented Jan 27, 2023

ack-bot commented Jan 27, 2023

surajkota left a comment

surajkota Feb 2, 2023

a-hilaly Feb 2, 2023

jaypipes Feb 2, 2023

surajkota Feb 2, 2023 •

edited

Loading

surajkota Feb 3, 2023 •

edited

Loading

RedbackThomson Feb 3, 2023

surajkota Feb 3, 2023

RedbackThomson Feb 3, 2023

surajkota Feb 3, 2023

surajkota Feb 3, 2023 •

edited

Loading

RedbackThomson left a comment

RedbackThomson Feb 3, 2023

RedbackThomson Feb 3, 2023

surajkota left a comment •

edited

Loading

jaypipes left a comment

RedbackThomson commented Feb 6, 2023

ack-prow bot commented Feb 6, 2023

	flag.StringVar(
	&cfg.Region, flagAWSRegion,
	envutil.WithDefault(envVarAWSRegion, ""),
	"The AWS Region in which the service controller will create its resources",
	)

	flag.Var(
	&cfg.DeletionPolicy, flagDeletionPolicy,
	"The default deletion policy for all resources managed by the controller",
	)

Add support for resource-specific resync periods and default drift remediation period #106

Add support for resource-specific resync periods and default drift remediation period #106

Conversation

a-hilaly commented Dec 22, 2022

a-hilaly commented Dec 22, 2022

jaypipes left a comment

Choose a reason for hiding this comment

jljaco left a comment

Choose a reason for hiding this comment

RedbackThomson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-hilaly commented Jan 27, 2023

ack-bot commented Jan 27, 2023

surajkota left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

surajkota Feb 2, 2023 • edited Loading

Choose a reason for hiding this comment

surajkota Feb 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

surajkota Feb 3, 2023 • edited Loading

Choose a reason for hiding this comment

RedbackThomson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

surajkota left a comment • edited Loading

Choose a reason for hiding this comment

jaypipes left a comment

Choose a reason for hiding this comment

RedbackThomson commented Feb 6, 2023

ack-prow bot commented Feb 6, 2023

surajkota Feb 2, 2023 •

edited

Loading

surajkota Feb 3, 2023 •

edited

Loading

surajkota Feb 3, 2023 •

edited

Loading

surajkota left a comment •

edited

Loading