Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINT] - Adress flakiness of Integration tests #2925

Open
viniciusdc opened this issue Jan 27, 2025 · 3 comments
Open

[MAINT] - Adress flakiness of Integration tests #2925

viniciusdc opened this issue Jan 27, 2025 · 3 comments

Comments

@viniciusdc
Copy link
Contributor

Context

With the recent adoption of await workflow, which is a blessing since before we needed to include the kubectl command ourself anyways manually, we are getting some weird issues a few times with the image puller; it seems like it got stuck waiting for it in a couple of deployments, looks like a flaky behavior and requires further validation. There may be a need to increase the time limit or retries.

Image
source: https://github.com/nebari-dev/nebari/actions/runs/12994981631/job/36240642535?pr=2924

Also, during releases, we have a hard time running CI against version bumps since, by common standard during the release workflow, we don't yet have the new images available, and the deployment fails under the check health status of the pods (namely jupyterhub)

Image
source: https://github.com/nebari-dev/nebari/actions/runs/12952884533/job/36211476433?pr=2924

Value and/or benefit

Running/stable testing

Anything else?

No response

@marcelovilla
Copy link
Member

@viniciusdc I think the first case is related to #2947. However, I agree our tests seem to be flaky and that needs to be addressed.

@viniciusdc
Copy link
Contributor Author

viniciusdc commented Feb 19, 2025

I recently noticed that there is another action that you can run with the jupyterhub/action-k8s-await-workloads@v3 and it allows you to inspect the affected pods (though usually we not need it since it generates too much data) for this specific error it allowed me finding a problem with promtail as seen bellow:

Image Image

Which is a know issue for running Kind:
https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files

I think I addressed this in the past, but maybe with the new update to ubuntu 24.x #2958 this might've been removed.

Since this is a bit different and mostly associated with the above update, I will open a new issue:

  • Address fsnotify "too many open files" error on test-local-integration

@viniciusdc
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: New 🚦
Development

No branches or pull requests

3 participants