Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make doc-html sometimes hangs after having finished #10163

Closed
jdemeyer opened this issue Oct 23, 2010 · 48 comments
Closed

make doc-html sometimes hangs after having finished #10163

jdemeyer opened this issue Oct 23, 2010 · 48 comments

Comments

@jdemeyer
Copy link
Contributor

The following is a non-reproducible problem, but it happened several times.
When building the Sage documentation using

make doc-html

it can happen that make simply hangs forever after the message

Build finished.  The built documents can be found in /mnt/usb1/scratch/jdemeyer/merger/sage-4.6.1.alpha0/devel/sage/doc/output/html/fr/a_tour_of_sage

Pressing CTRL-C at this point gives

make: *** [doc-html] Interrupt

Here is a ps snapshot of a situation where it happened:

$ ps -u jdemeyer xfo pid,ppid,sess,tty,tpgid,args  # Irrelevant entries removed
  PID  PPID  SESS TT       TPGID COMMAND
30777     1 23411 pts/6     6956 python /mnt/usb1/scratch/jdemeyer/merger/sage-4.7.2.alpha4/local/bin/sage-cleaner
21614     1 21614 ?           -1 SCREEN
23411 21614 23411 pts/6     6956  \_ /bin/bash
 6956 23411 23411 pts/6     6956  |   \_ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
 6969  6956 23411 pts/6     6956  |       \_ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
 6970  6969 23411 pts/6     6956  |       |   \_ tee /home/release/sage-4.7.2.alpha4/logs/merger.log
30248  6956 23411 pts/6     6956  |       \_ make -j6 doc-html
30249 30248 23411 pts/6     6956  |           \_ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
30250 30249 23411 pts/6     6956  |           |   \_ tee /home/release/sage-4.7.2.alpha4/logs/dochtml.log
30747 30248 23411 pts/6     6956  |           \_ /bin/sh -c spkg/pipestatus "./sage -docbuild --no-pdf-links all html  2>&1" "tee -a dochtml.log"
30748 30747 23411 pts/6     6956  |               \_ bash spkg/pipestatus ./sage -docbuild --no-pdf-links all html  2>&1 tee -a dochtml.log
30750 30748 23411 pts/6     6956  |                   \_ tee -a dochtml.log

Note the orphaned sage-cleaner process with the same session ID as make doc-html.

If it matters:

$ bash --version
GNU bash, version 4.1.0(1)-release (x86_64-unknown-linux-gnu)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Apply attachment: 10163_sage_cleaner.patch

Component: build

Keywords: build Makefile sage-cleaner cleaner

Author: Jeroen Demeyer

Reviewer: Volker Braun

Merged: sage-4.7.2.alpha4

Issue created by migration from https://trac.sagemath.org/ticket/10163

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link
Contributor Author

Changed keywords from none to build Makefile

@jdemeyer
Copy link
Contributor Author

comment:2

This might be due to NFS (I know it happened on sage.math.washington.edu but I can't remember whether it ever happened on other machines).

@jdemeyer
Copy link
Contributor Author

comment:3

I am getting more and more convinced that this is an NFS issue, so proposal to close this ticket as invalid.

@jdemeyer jdemeyer removed this from the sage-4.6.2 milestone Jan 19, 2011
@vbraun
Copy link
Member

vbraun commented Mar 2, 2011

comment:4

Just because NFS triggers it doesn't necessarily mean that its not a bug in Sage. I've never seen this issue, but then I never use high-latency filesystems. Can you run strace/gdb on the stuck process if you see it again? (use -p to attach to existing pid)

@jdemeyer

This comment has been minimized.

@jdemeyer jdemeyer added this to the sage-4.8 milestone Oct 6, 2011
@jhpalmieri
Copy link
Member

comment:6

Should there be some timeout setting in sage-cleaner? If it has to wait too long, exit but print a warning message that there may be some stray processes which should be killed by hand?

@vbraun
Copy link
Member

vbraun commented Oct 6, 2011

comment:7

No the sage cleaner needs to stick around for as long as Sage is running. There might be a computation running that, after a long while, does a big @parallel loop and if nobody is cleaning up the filesystem will be very unhappy.

@jhpalmieri
Copy link
Member

comment:8

Does sage-cleaner actually run when you do sage -docbuild ... (which is called by make doc-html)? It doesn't look like it to me. It seems that sage -docbuild all html creates a bunch of temporary directories in DOT_SAGE/temp/HOSTNAME, and those aren't deleted until the next time I run Sage, when sage-cleaner actually runs. So maybe the issue on this ticket isn't related to sage-cleaner.

@jdemeyer

This comment has been minimized.

@jdemeyer

This comment has been minimized.

@jdemeyer

This comment has been minimized.

@vbraun
Copy link
Member

vbraun commented Oct 7, 2011

comment:12

Can you also add the content of ~/.sage/temp/HOSTNAME/ to the bug report? The sage cleaner lives on until it is empty...

@jdemeyer

This comment has been minimized.

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 7, 2011

comment:14

It seems to have pulled itself through after a while. Don't know why...

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 7, 2011

comment:15

Replying to @vbraun:

Can you also add the content of ~/.sage/temp/HOSTNAME/ to the bug report? The sage cleaner lives on until it is empty...

What do mean with "empty"? Does this mean that, if I have 10 copies of Sage running, the Sage cleaner will only exit if all those 10 copies of Sage exit?

@vbraun
Copy link
Member

vbraun commented Oct 7, 2011

comment:16

I think so. Its purpose is to delete the ~/.sage/temp/HOSTNAME/PID directory once the process with PID has ended. Also, note that if you start Sage multiple times only one sage-cleaner process is started. Really, make doc should not wait for the sage cleaner process to finish...

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 7, 2011

comment:17

Replying to @vbraun:

I think so. Its purpose is to delete the ~/.sage/temp/HOSTNAME/PID directory once the process with PID has ended. Also, note that if you start Sage multiple times only one sage-cleaner process is started. Really, make doc should not wait for the sage cleaner process to finish...

This is probably because of file descriptors being open.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

comment:25

sphinx-build does start sage-cleaner:

$ ps -u jdemeyer xfo pid,ppid,pgid,sess,tty,tpgid,stat,args
  PID  PPID  PGID  SESS TT       TPGID STAT COMMAND
 2811     1  2262 26064 pts/3    26064 T    python /scratch/jdemeyer/sage-4.7.2.alpha3/local/bin/sage-cleaner
 2830 24409  2829 24409 pts/6     2829 S+    |   \_ grep -e pts/3 -e COMMAND
26064  6198 26064 26064 pts/3    26064 Ss+   \_ /bin/bash
 2262 26064  2262 26064 pts/3    26064 T     |   \_ make doc-html
 2781  2262  2262 26064 pts/3    26064 T     |       \_ /bin/sh -c spkg/pipestatus "./sage -docbuild --no-pdf-links all html  2>&1" "tee -a dochtml.log"
 2782  2781  2262 26064 pts/3    26064 T     |           \_ bash spkg/pipestatus ./sage -docbuild --no-pdf-links all html  2>&1 tee -a dochtml.log
 2783  2782  2262 26064 pts/3    26064 T     |               \_ bash ./sage -docbuild --no-pdf-links all html
 2791  2783  2262 26064 pts/3    26064 T     |               |   \_ bash /scratch/jdemeyer/sage-4.7.2.alpha3/local/bin/sage-sage -docbuild --no-pdf-links all html
 2806  2791  2262 26064 pts/3    26064 T     |               |       \_ python /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/common/builder.py --no-pdf-links all html
 2807  2806  2262 26064 pts/3    26064 T     |               |           \_ /bin/sh -c sphinx-build -b html -d /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/output/doctrees/de/tutorial   -A hide_pdf_links=1 /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/de/tutorial /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/output/html/de/tutorial
 2808  2807  2262 26064 pts/3    26064 T     |               |               \_ python /scratch/jdemeyer/sage-4.7.2.alpha3/local/bin/sphinx-build -b html -d /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/output/doctrees/de/tutorial -A hide_pdf_links=1 /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/de/tutorial /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/output/html/de/tutorial
 2784  2782  2262 26064 pts/3    26064 T     |               \_ tee -a dochtml.log

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

comment:26

Frustrating that it is hard to reproduce: make doc-html sometimes runs sage-cleaner.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

comment:27

This is sphinx-build:

#!/usr/bin/env python
# EASY-INSTALL-ENTRY-SCRIPT: 'Sphinx==1.0.4','console_scripts','sphinx-build'
__requires__ = 'Sphinx==1.0.4'
import sys
from pkg_resources import load_entry_point
import sage.all
sys.exit(
   load_entry_point('Sphinx==1.0.4', 'console_scripts', 'sphinx-build')()
)

Note the line "import sage.all". Perhaps this can sometimes (indirectly) run sage-cleaner?

@vbraun
Copy link
Member

vbraun commented Oct 8, 2011

comment:28

I don't see how the import sage.all could start sage-cleaner. If you can reproduce the problem, how about you strace the docbuild process? Then it would be easy to find out which child process starts the cleaner.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

comment:29

Replying to @vbraun:

If you can reproduce the problem

That's precisely the problem, it must be some weird race-condition.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

comment:30

I found a way to reproduce it: when $HOME/.sage does not exist, sage-cleaner is always started when doing make doc-html.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

comment:31

It is indeed the line import sage.all which causes sage-cleaner to be run.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

Attachment: 10163_debug_sage_cleaner.patch.gz

STOP all processes when sage-cleaner is run, do not apply

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

comment:32

gap seems to be involved somehow. The last thing which happens before starting sage-cleaner is starting gap.

@vbraun
Copy link
Member

vbraun commented Oct 8, 2011

comment:33

But why does make hang until the cleaner exits? The sage-cleaner process is already orphaned, so nobody should be waiting for it. Did you strace the hanging make process? I would tend to think that it is a good thing that the sage-cleaner is started.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

comment:34

This reproduces the problem:

$ ( export HOME=/tmp/sagehome; rm -rf $HOME; mkdir -p $HOME/.sage/temp/`hostname`/$$; make doc-html; )

What happens here is that we start from a clean $HOME and create a bogus directory in $HOME/.sage/temp/$hostname to make sure sage-cleaner does not exit. Note that you can delete the last 3 lines from local/bin/sphinx-build to speed up testing.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

Attachment: 10163_sage_cleaner.patch.gz

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

Author: Jeroen Demeyer

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

comment:36

The problem was dangling file descriptors. This patch fixes the problem for me.

@vbraun
Copy link
Member

vbraun commented Oct 8, 2011

comment:37

Hmm maybe sage-cleaner should close its stdout/stderr instead of hoping that all instances that call it do. But then that would make it annoying to debug the cleaner... In any case, the patch looks good to me.

@vbraun
Copy link
Member

vbraun commented Oct 8, 2011

Reviewer: Volker Braun

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

comment:38

Replying to @vbraun:

Hmm maybe sage-cleaner should close its stdout/stderr instead of hoping that all instances that call it do. But then that would make it annoying to debug the cleaner...

Exactly, that's what I was also thinking.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

Changed keywords from build Makefile to build Makefile sage-cleaner

@jdemeyer jdemeyer modified the milestones: sage-4.8, sage-4.7.2 Oct 8, 2011
@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

Changed keywords from build Makefile sage-cleaner to build Makefile sage-cleaner cleaner

@nexttime
Copy link
Mannequin

nexttime mannequin commented Oct 8, 2011

comment:40

We had similar problems with (e.g.) ptestlong and (GAP?) orphans IIRC; while sage -tp ... did terminate ("All tests passed!"), tee did not because some file descriptor(s) were still open, so you didn't get the shell prompt back.

Killing the orphans the shell prompt returned.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 8, 2011

Merged: sage-4.7.2.alpha4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants