Jobs requeueing on new OS #2

heatherkellyucl · 2015-09-04T12:11:18Z

Jobs which complete successfully or end on an error are going back into the queue to be rerun rather than being removed from it. This is a known issue and we are working on it.

heatherkellyucl · 2015-09-07T08:13:50Z

This was fixed on Friday afternoon and jobs will complete properly now.

LukeSudberyUCL · 2015-09-07T10:33:36Z

To surmise - here's was what the problem(s) was:

a new part of the epilog was introduced - this was designed to clear any locks on /dev/ipath which jobs left behind. This worked, but unfortunately if there were no locks to start with it introduced an error.
extra debugging and logging was added to try and find and record the above problem, and any futures ones like it.
once the original problem was resolved, the debugging (which retained the jobs state as 'active' on nodes were it failed) remained, and certain jobs kept reuseing this data - saying they had failed, when in fact now they shouldn't be.

So the first part was fixed on Friday, and the last step was resolved this morning (Monday).

JuliaLang/julia#31555

heatherkellyucl added the bug label Sep 4, 2015

heatherkellyucl self-assigned this Sep 4, 2015

heatherkellyucl closed this as completed Sep 7, 2015

owainkenwayucl added a commit that referenced this issue Nov 18, 2019

Attempt #2 to resolve ld bug.

4fb296d

JuliaLang/julia#31555

hillary-b mentioned this issue Dec 14, 2020

Install Request: OpenFoam 7 [IN03817417] #307

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jobs requeueing on new OS #2

Jobs requeueing on new OS #2

heatherkellyucl commented Sep 4, 2015

heatherkellyucl commented Sep 7, 2015

LukeSudberyUCL commented Sep 7, 2015

Jobs requeueing on new OS #2

Jobs requeueing on new OS #2

Comments

heatherkellyucl commented Sep 4, 2015

heatherkellyucl commented Sep 7, 2015

LukeSudberyUCL commented Sep 7, 2015