Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #661: Ignore Gateways not managed by Lattice Controller #709

Merged
merged 10 commits into from
Feb 11, 2025

Conversation

u-kai
Copy link
Contributor

@u-kai u-kai commented Feb 10, 2025

What type of PR is this?

Bug fix and refactoring related to issue #661

Which issue does this PR fix:

#661 and maybe #669

What does this PR do / Why do we need it:

This PR fixes the logic for finding Gateways related to the XXXRoute resource.
Now, it ignores Gateways that are not controlled by the Lattice Controller.

This change prevents operations on Gateways that are not managed by the Lattice Controller, addressing issue #661.

If an issue # is not available please add repro steps and logs from aws-gateway-controller showing the issue:
Fixes issue #661

Testing done on this change:

Added the following unit tests to verify that only resources controlled by the Lattice Controller are properly processed:

  • route_controller_test.go: Ensured that inserting an unrelated Gateway into the test setup does not affect the expected values.
  • model_build_lattice_service_test.go : Added a test case where an unrelated Gateway exists.
  • model_build_listener_test.go : Added a test case where an unrelated Gateway exists.

Automation added to e2e:

None

Will this PR introduce any new dependencies?:

None

Will this break upgrades or downgrades. Has updating a running cluster been tested?:

No, this change does not break upgrades or downgrades.
Unit tests and end-to-end (e2e) tests confirm that the system behaves as expected.

Does this PR introduce any user-facing change?:


Do all end-to-end tests successfully pass when running make e2e-test?:

Yes

SECONDARY_ACCOUNT_TEST_ROLE_ARN=arn:aws:iam:xxxxxxxxxxxx:role/TestRole REGION=ap-northeast-1  make e2e-test
...
------------------------------
[SynchronizedAfterSuite] 
/Users/kai/aws-application-networking-k8s/test/suites/integration/suite_test.go:72
{"level":"info","ts":"2025-02-10T12:17:47.307+0900","caller":"test/framework.go:266","msg":"Deleting objects: *v1.Gateway/test-gateway, *v1.Pod/grpc-runner"}
{"level":"info","ts":"2025-02-10T12:17:47.348+0900","caller":"test/framework.go:285","msg":"Waiting for NotFound, objects: *v1.Gateway/test-gateway, *v1.Pod/grpc-runner"}
[SynchronizedAfterSuite] PASSED [31.191 seconds]
------------------------------

Ran 69 of 69 Specs in 3072.983 seconds
SUCCESS! -- 69 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestIntegration (3072.99s)
PASS
ok      github.com/aws/aws-application-networking-k8s/test/suites/integration   3073.570s

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@rlymbur rlymbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀 Thank you for addressing these issues.

@rlymbur rlymbur added this pull request to the merge queue Feb 10, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 10, 2025
@u-kai
Copy link
Contributor Author

u-kai commented Feb 10, 2025

@rlymbur
Thanks for the review!
After merging, I noticed that a test failed due to an error I haven't seen before. I'll investigate the issue.

@u-kai
Copy link
Contributor Author

u-kai commented Feb 11, 2025

@rlymbur

I reran the e2e test with the same seed(1739217549) as the failed run, and I was able to reproduce the failure.

Upon investigation, I found that a TargetGroup created in the previous test had not been fully deleted. As a result, the failing test might have referenced the TargetGroup from the previous test instead of the newly created TargetGroup.

The existing logic for finding a TargetGroup searches by the Kubernetes Service name and returns the first match immediately. As a minor fix, I updated it to return the most recently created TargetGroup instead.

After applying this change, I reran the test with the same seed, and it passed successfully.

=== RUN   TestIntegration
Running Suite: Integration - /Users/kai/aws-application-networking-k8s/test/suites/integration
==============================================================================================
Random Seed: 1739217549
...
Ran 69 of 69 Specs in 2776.035 seconds
SUCCESS! -- 69 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestIntegration (2776.04s)
PASS
ok  	github.com/aws/aws-application-networking-k8s/test/suites/integration	2776.617s

Additionally, during this investigation, I discovered a missing update in the XXXRoute resource update logic. I have included this fix in a separate commit after the e2e test fix. Apologies for the additional request, but I would appreciate it if you could review it as well.

@rlymbur
Copy link
Contributor

rlymbur commented Feb 11, 2025

@rlymbur

I reran the e2e test with the same seed(1739217549) as the failed run, and I was able to reproduce the failure.

Upon investigation, I found that a TargetGroup created in the previous test had not been fully deleted. As a result, the failing test might have referenced the TargetGroup from the previous test instead of the newly created TargetGroup.

The existing logic for finding a TargetGroup searches by the Kubernetes Service name and returns the first match immediately. As a minor fix, I updated it to return the most recently created TargetGroup instead.

After applying this change, I reran the test with the same seed, and it passed successfully.

=== RUN   TestIntegration
Running Suite: Integration - /Users/kai/aws-application-networking-k8s/test/suites/integration
==============================================================================================
Random Seed: 1739217549
...
Ran 69 of 69 Specs in 2776.035 seconds
SUCCESS! -- 69 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestIntegration (2776.04s)
PASS
ok  	github.com/aws/aws-application-networking-k8s/test/suites/integration	2776.617s

Additionally, during this investigation, I discovered a missing update in the XXXRoute resource update logic. I have included this fix in a separate commit after the e2e test fix. Apologies for the additional request, but I would appreciate it if you could review it as well.

Thank you for diving into this! It looks like there may be a new issue now when running make presubmit; specifically make test. Do you mind taking another look?

@u-kai
Copy link
Contributor Author

u-kai commented Feb 11, 2025

@rlymbur
Thank you for the review.

After investigating the error, I believe this test is likely a flaky test that occasionally fails.

In the test, there is a sleep statement as shown below. When I reduced the sleep duration, I was able to reproduce the error.

time.Sleep(ivl * (time.Duration(nCycles) + 2)) // sleep enough cycles to terminate
assert.Equal(t, nCycles, n, fmt.Sprintf("should run only %d cycles", nCycles))

I considered increasing the sleep duration, but since the current value works fine in most cases, I have prioritized test speed and left it unchanged.

I apologize for the trouble, but could you run the presubmit test again?

@rlymbur
Copy link
Contributor

rlymbur commented Feb 11, 2025

@rlymbur Thank you for the review.

After investigating the error, I believe this test is likely a flaky test that occasionally fails.

In the test, there is a sleep statement as shown below. When I reduced the sleep duration, I was able to reproduce the error.

time.Sleep(ivl * (time.Duration(nCycles) + 2)) // sleep enough cycles to terminate
assert.Equal(t, nCycles, n, fmt.Sprintf("should run only %d cycles", nCycles))

I considered increasing the sleep duration, but since the current value works fine in most cases, I have prioritized test speed and left it unchanged.

I apologize for the trouble, but could you run the presubmit test again?

Thanks for looking at this again. Looks like the tests are passing now. I'll keep an eye on this one to correct it. I've approved again to push to the merge queue.

@rlymbur rlymbur added this pull request to the merge queue Feb 11, 2025
Merged via the queue into aws:main with commit 00115ee Feb 11, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants