[MAINT] - Adress flakiness of Integration tests #2925

viniciusdc · 2025-01-27T18:20:58Z

Context

With the recent adoption of await workflow, which is a blessing since before we needed to include the kubectl command ourself anyways manually, we are getting some weird issues a few times with the image puller; it seems like it got stuck waiting for it in a couple of deployments, looks like a flaky behavior and requires further validation. There may be a need to increase the time limit or retries.

source: https://github.com/nebari-dev/nebari/actions/runs/12994981631/job/36240642535?pr=2924

Also, during releases, we have a hard time running CI against version bumps since, by common standard during the release workflow, we don't yet have the new images available, and the deployment fails under the check health status of the pods (namely jupyterhub)

source: https://github.com/nebari-dev/nebari/actions/runs/12952884533/job/36211476433?pr=2924

Value and/or benefit

Running/stable testing

Anything else?

No response

marcelovilla · 2025-02-10T10:11:21Z

@viniciusdc I think the first case is related to #2947. However, I agree our tests seem to be flaky and that needs to be addressed.

viniciusdc · 2025-02-19T16:43:57Z

I recently noticed that there is another action that you can run with the jupyterhub/action-k8s-await-workloads@v3 and it allows you to inspect the affected pods (though usually we not need it since it generates too much data) for this specific error it allowed me finding a problem with promtail as seen bellow:

Which is a know issue for running Kind:
https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files

I think I addressed this in the past, but maybe with the new update to ubuntu 24.x #2958 this might've been removed.

Since this is a bit different and mostly associated with the above update, I will open a new issue:

Address fsnotify "too many open files" error on test-local-integration

viniciusdc · 2025-02-19T17:19:23Z

This is one workflow where we can see the above error message https://github.com/nebari-dev/nebari/actions/runs/13415146171/job/37478696321?pr=2965, and here is a second run with the update of inotify https://github.com/nebari-dev/nebari/actions/runs/13417860818/job/37483043593?pr=2965

viniciusdc added area: CI/CD 👷🏽‍♀️ area: testing ✅ Testing type: maintenance 🛠 Day-to-day maintenance tasks labels Jan 27, 2025

github-project-automation bot added this to 🪴 Nebari Project Management Jan 27, 2025

github-project-automation bot moved this to New 🚦 in 🪴 Nebari Project Management Jan 27, 2025

marcelovilla added this to the Improving CI/CD and Test Reliability milestone Feb 10, 2025

viniciusdc mentioned this issue Feb 19, 2025

[BUG] - Address fsnotify "too many open files" error on test-local-integration #2966

Closed

marcelovilla modified the milestones: Identify and assess critical user journeys, Nebari Maintenance Team - Backlog Feb 25, 2025

dcmcand removed this from the Nebari Maintenance Team - Backlog milestone Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAINT] - Adress flakiness of Integration tests #2925

[MAINT] - Adress flakiness of Integration tests #2925

viniciusdc commented Jan 27, 2025

marcelovilla commented Feb 10, 2025

viniciusdc commented Feb 19, 2025 •

edited

Loading

viniciusdc commented Feb 19, 2025

[MAINT] - Adress flakiness of Integration tests #2925

[MAINT] - Adress flakiness of Integration tests #2925

Comments

viniciusdc commented Jan 27, 2025

Context

Value and/or benefit

Anything else?

marcelovilla commented Feb 10, 2025

viniciusdc commented Feb 19, 2025 • edited Loading

viniciusdc commented Feb 19, 2025

viniciusdc commented Feb 19, 2025 •

edited

Loading