Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignored OSErrors in test output on Cygwin #25107

Closed
embray opened this issue Apr 6, 2018 · 24 comments
Closed

Ignored OSErrors in test output on Cygwin #25107

embray opened this issue Apr 6, 2018 · 24 comments

Comments

@embray
Copy link
Contributor

embray commented Apr 6, 2018

When running the test suite on Cygwin, whether a test passes or fails the test log is full (for each test module of output like):

sage -t --long src/sage/interfaces/gap.py
Exception OSError: (2, 'No such file or directory', '/tmp/tmpwVkVtd') in <bound method _TemporaryFileWrapper.__del__ of <closed file '<fdopen>', mode 'w+b' at 0x6f8032f1e40>> ignored
    [224 tests, 49.25 s]

in particular this of course causes tests of the doctest runner itself to fail, and it's also making all the patchbot runs fail.

Component: porting: Cygwin

Author: Erik Bray

Branch/Commit: 85576e9

Reviewer: Jeroen Demeyer

Issue created by migration from https://trac.sagemath.org/ticket/25107

@embray embray added this to the sage-8.2 milestone Apr 6, 2018
@embray
Copy link
Contributor Author

embray commented Apr 6, 2018

comment:1

I suspect this probably started with #24343 somehow.

@jdemeyer
Copy link
Contributor

jdemeyer commented Apr 6, 2018

comment:2

Obvious question: is it a Sage or Python bug? I don't have a high opinion of Python's tempfile module, it had caused problems in the past.

@embray
Copy link
Contributor Author

embray commented Apr 6, 2018

comment:3

It's kind of a Python bug. I've actually been bitten by this issue before, but before #24343 it was more random and somewhat uncommon. The changes in #24343 (e.g. explicitly closing files more often) made the problem guaranteed to occur.

@embray
Copy link
Contributor Author

embray commented Apr 6, 2018

Author: Erik Bray

@embray
Copy link
Contributor Author

embray commented Apr 6, 2018

Commit: 75720fd

@embray
Copy link
Contributor Author

embray commented Apr 6, 2018

@embray
Copy link
Contributor Author

embray commented Apr 6, 2018

comment:4

This should resolve the issue for now--considering that this is a blocker AFAIC this workaround should be good enough for now. In the meantime I would like to continue to rethink this code and how it handles files.


New commits:

f0d297cUpdate this version check to support newer JRE versions (>=1.9)
db5ab23In the Jmol interface, support image outputs that contain spaces in their paths
75720fdFix overly aggressive deleting of output temp file on Cygwin

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 6, 2018

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

03d4f0fFix overly aggressive deleting of output temp file on Cygwin

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 6, 2018

Changed commit from 75720fd to 03d4f0f

@embray
Copy link
Contributor Author

embray commented Apr 6, 2018

comment:6

Oops, rebased on correct branch.

@vbraun
Copy link
Member

vbraun commented Apr 9, 2018

comment:7

Can you wrap it in a if-cygwin branch instead of making assumptions about which temp file implementation is used?

@embray
Copy link
Contributor Author

embray commented Apr 10, 2018

comment:8

It's not making assumptions--as written it's making a feature check that isn't necessarily platform specific, which is usually preferable.

@jdemeyer
Copy link
Contributor

comment:9

Do we really need to close the file there? Maybe a different and simpler solution would be to just remove that self.outtmpfile.close().

@embray
Copy link
Contributor Author

embray commented Apr 12, 2018

comment:10

I alluded to this in [#comment:3 3] but the file will be deleted when the object is closed anyways, whether that close is explicit, or comes sometime later when the object goes out of scope and its __del__ is called. Explicitly closing ensures things happen predictably, and not doing so is not a solution to the actual problem.

Anyways, that was added for avoiding ResourceWarnings on Python 3.

@jdemeyer
Copy link
Contributor

comment:11

OK, I think I understand the reasons now, but it wasn't obvious from reading the code (including the comments). So I agree with the fix, but the comment should be improved. I'll think about that tomorrow.

@jdemeyer
Copy link
Contributor

comment:12

The real problem is that the file is closed twice: once in the master process and once in the child. It can only be deleted once.

@embray
Copy link
Contributor Author

embray commented Apr 13, 2018

comment:13

Replying to @jdemeyer:

The real problem is that the file is closed twice: once in the master process and once in the child. It can only be deleted once.

Right, that's whole point. I more or less wrote that in the comment but I guess it should have been clearer...

@embray
Copy link
Contributor Author

embray commented Apr 13, 2018

comment:14

Perhaps to give slightly more detail as well, regarding the usage/implementation of the tempfile module (which I agree with you is a bit crummy):

In Sage here we are using the TemporaryFile factory (which despite the CamelCase is actually function). This basically uses mkstemp to create a unique filename, then returns a file-like object around the file descriptor and immediately calls os.unlink on the filename (effectively deleting the file as soon as it's no longer needed.

However, there is also NamedTemporaryFile which does not immediately unlink the name, so that the file can have a name that can be used in other processes, for example. In this case it's up to the user to delete the file, although it has an auto-delete functionality delete=True enabled by default in fact, that deletes the file after it is closed in __del__. Unfortunately the implementation is a bit buggy, I think, since there's an unhandled exception if the file was already deleted when it gets to this point :(

The problem arises here because the tempfile quietly replaces TemporaryFile with NamedTemporaryFile on non-POSIX systems and on Cygwin, since the ability to delete a file while it has an open file descriptor is not guaranteed on other platforms. This is actually annoying for a couple reasons:

  1. NamedTemporaryFile and TemporaryFile have different APIs, and this switch is done rather implicitly. You also get the aforementioned buggy behavior of NamedTemporaryFile when it comes to double-deletes.

  2. At least on Cygwin this is unnecessary. It might have been a concession to an older version of Cygwin (this hack seems to have been in place for a while). But it's not necessary at all on more modern versions of Cygwin which do allow unlinking open files.

So the problem is that on Cygwin the TemporaryFile gets replaced with a NamedTemporaryFile with delete=True (which can be detected through presence of the delete attribute, though an explicit isinstance check could also be possible). And, as Jeroen wrote, this means that because that NamedTemporaryFile is instantiated before forking, it exists in two processes, and whichever one happens to close the file first deletes it. Then the process that closes the file second produces this annoying unhandled exception in __del__ message.

This has actually been a problem for a long time, and I've seen it show up in the tests before rather randomly. But it was relatively uncommon since most of the time the file was closed first in the parent process and we didn't see the message from the child process.

@embray
Copy link
Contributor Author

embray commented Apr 13, 2018

comment:15

Also, while I'm calling the behavior of NamedTemporaryFile "buggy" with respect to double deletions, it may in fact be intentional, as this could point toward a programming error of some kind (which I believe is in fact the case here).

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 13, 2018

Branch pushed to git repo; I updated commit sha1. New commits:

85576e9add a more detailed comment

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 13, 2018

Changed commit from 03d4f0f to 85576e9

@embray
Copy link
Contributor Author

embray commented Apr 13, 2018

comment:17

I improved (I think) the comment and also linked to my comment above for more detail. I don't think it's such a deep problem though (albeit annoying)...


New commits:

85576e9add a more detailed comment

@jdemeyer
Copy link
Contributor

Reviewer: Jeroen Demeyer

@vbraun
Copy link
Member

vbraun commented Apr 14, 2018

Changed branch from u/embray/cygwin/temp-file-wrapper-del to 85576e9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants