Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec: error: timed out waiting for the condition #20

Open
barrettj12 opened this issue Sep 7, 2023 · 12 comments
Open

exec: error: timed out waiting for the condition #20

barrettj12 opened this issue Sep 7, 2023 · 12 comments

Comments

@barrettj12
Copy link
Contributor

Our workflow:

        uses: balchua/microk8s-actions@1e8e626239c2befe7cd5d258c96ae152a7259c74
        with:
          channel: "1.25-strict/stable"
          addons: '["dns", "hostpath-storage"]'

Logs:

Waiting for hostpath-storage to be ready 
exec sudo microk8s kubectl rollout status deployment/hostpath-provisioner -n kube-system --timeout=90s { silent: true }
Error: exec: error: timed out waiting for the condition

See the full run here.

@balchua
Copy link
Owner

balchua commented Sep 7, 2023

@barrettj12 thanks for logging the issue. I haven't tried hostpath with strict confinement. Let me try that out. Will get back to you.

@balchua
Copy link
Owner

balchua commented Sep 9, 2023

@barrettj12 i couldn't reproduce the error you are getting, i tried it with several MicroK8s versions. See the jobs here.

But as a precaution, i increased the timeout to 120s hoping that it will alleviate such scenario. Do you mind trying the new release? https://github.com/balchua/microk8s-actions/releases/tag/v0.4.1

Thanks,

@barrettj12
Copy link
Contributor Author

@balchua I'll try that, thanks. How about making that timeout configurable, so users can find their own value that works?

@balchua
Copy link
Owner

balchua commented Sep 11, 2023

Thanks. I was thinking of that while working on this issue.
Do you have an idea on how to present this knob to the user?
Appreciate your thoughts.

@barrettj12
Copy link
Contributor Author

Probably just an input to the action would be fine:

        uses: balchua/microk8s-actions@1e8e626239c2befe7cd5d258c96ae152a7259c74
        with:
          channel: "1.25-strict/stable"
          addons: '["dns", "hostpath-storage"]'
          timeout: 120s

@balchua
Copy link
Owner

balchua commented Sep 11, 2023

Was wondering should the action even check for the readiness of the addon.
Perhaps we can leave that to the user.

@SimonRichardson
Copy link

As @barrettj12 has pointed out, this is happening a lot in the Juju project. Is there a way to expose why it's not passing in a given time, just from the test output? It's clear that there is a possible unresolved issue that we're not exposing.

@balchua
Copy link
Owner

balchua commented Sep 11, 2023

The existing code is waiting for the hostpath provisioner to be ready and fails when it times out.
I think it shouldn't fail the build when its not ready. However it may give an impression to the user that everything seems to be ok when its not, hence i added the check.
I guess it is leading to more harm than it should be.
So what i'll do is to continue waiting for a specified amount of time (its 120s for now) but do not fail the build when its still not ready.

@barrettj12
Copy link
Contributor Author

I think 120s will still be too short in some cases. We are now running self-hosted runners which may have surfaced this issue.

If things are ready before the timeout, the command will still exit early, right? In which case, it should be safe to set the timeout to something large like 10 minutes - as it shouldn't take this long ever.

@balchua
Copy link
Owner

balchua commented Sep 13, 2023

If things are ready before the timeout, the command will still exit early, right? In which case, it should be safe to set the timeout to something large like 10 minutes - as it shouldn't take this long ever.

Yes it will exit early.
Thanks for the feedback! I will probably implement it this way.

@barrettj12
Copy link
Contributor Author

@balchua I just tested v0.4.1, unfortunately it doesn't solve our issue. See some failed runs here:
https://github.com/juju/juju/actions/runs/6171575601/job/16750097840?pr=16242
https://github.com/juju/juju/actions/runs/6171575585/job/16750097890?pr=16242

The first one in particular is really strange - getting what looks like a stack trace.

@balchua
Copy link
Owner

balchua commented Sep 13, 2023

You are right. The first one took 2h before it threw that exception.
I've never came across such error and took that long.
While the second one took 12m before it finally gave up.
Both error are strange. Could it be that there's something wrong with your runner host?

jujubot added a commit to juju/juju that referenced this issue Sep 19, 2023
#16242

We are seeing intermittent failures on tests involving microk8s, due to an issue balchua/microk8s-actions#20 where the action is timing out waiting for storage to become available.

Update to the latest version, where the timeout has been increased to 15 minutes - hopefully this will alleviate this issue somewhat.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants