-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly handle CARM errors and requeues #140
Properly handle CARM errors and requeues #140
Conversation
Prior to this patch we introduced better mechanisms to handle Role ARN retrievals from the CARM cache. Which improved ACK runtime scaling capabilities and addressed possible race condition scenarios. However in the process we missed two things: - Setting an ACK.ResourceSynced condition stating that the resource isn't synced, yet. - Returning the right runtime error that will cause the reconciller to requeue every 15seconds. Both of the points stemmed from the fact that we're not "yet" in the reconcile function that is wrapped by a proper error handler (that triggers requeues, and resets/sets resource conditions). In this special case, we need to manually inject the condition and return a controller-runtime error that will trigger a requeue after 15seconds. While this is a good fix, we're planning on refactoring a lot of the runtime logic to make easier to read, maintain and more importantly expose reusable component that will help avoid falling into such traps. Signed-off-by: Amine Hilaly <[email protected]>
@@ -124,6 +124,7 @@ func (r *adoptionReconciler) reconcile(ctx context.Context, req ctrlrt.Request) | |||
// is not available. | |||
roleARN, err = r.getRoleARN(acctID) | |||
if err != nil { | |||
ackrtlog.InfoAdoptedResource(r.log, res, fmt.Sprintf("Unable to start adoption reconcilliation %s: %v", acctID, err)) | |||
// r.getRoleARN errors are not terminal, we should requeue. | |||
return requeue.NeededAfter(err, roleARNNotAvailableRequeueDelay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requeue works fine, because r.reconcile
is wrapped by handleReconcileErrors
/test all |
/retest |
1 similar comment
/retest |
/test all |
/retest |
2 similar comments
/retest |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: a-hilaly, yuxiang-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Propagating two main patches to all the controllers: - Properly handle CARM errors and requeues by @a-hilaly in aws-controllers-k8s/runtime#140 - Introduce flags for fine-tuning maximum concurrent reconciles per resource by @a-hilaly in aws-controllers-k8s/runtime#141 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Prior to this patch we introduced new mechanisms to handle Role ARN retrievals from the CARM cache. Which improved ACK runtime scaling capabilities and addressed possible race condition scenarios. However in the process we missed two things: - Setting an `ACK.ResourceSynced` condition stating that the resource isn't synced, yet. - Returning the **correct** runtime error that will cause the reconciller to requeue every 15seconds. Both of the problems stemmed from the fact that we're not "yet" in the reconcile function (`rm.Sync`) that is wrapped by a proper error handler (that triggers requeues, and resets/sets resource conditions). In this special case, we need to manually inject the condition and return a controller-runtime error that will trigger a requeue after 15seconds. While this is a "fair" fix, we're planning on refactoring a lot of the runtime logic to make easier to read, maintain and more importantly expose reusable component that will help avoid falling into such traps. Signed-off-by: Amine Hilaly <[email protected]> By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Prior to this patch we introduced new mechanisms to handle Role ARN
retrievals from the CARM cache. Which improved ACK runtime scaling
capabilities and addressed possible race condition scenarios. However in
the process we missed two things:
ACK.ResourceSynced
condition stating that the resourceisn't synced, yet.
requeue every 15seconds.
Both of the problems stemmed from the fact that we're not "yet" in the
reconcile function (
rm.Sync
) that is wrapped by a proper error handler(that triggers requeues, and resets/sets resource conditions). In this
special case, we need to manually inject the condition and return a
controller-runtime error that will trigger a requeue after 15seconds.
While this is a "fair" fix, we're planning on refactoring a lot of the
runtime logic to make easier to read, maintain and more importantly
expose reusable component that will help avoid falling into such traps.
Signed-off-by: Amine Hilaly [email protected]
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.