Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal handling from user code #22883

Closed
antoine-levitt opened this issue Jul 20, 2017 · 10 comments
Closed

Signal handling from user code #22883

antoine-levitt opened this issue Jul 20, 2017 · 10 comments

Comments

@antoine-levitt
Copy link
Contributor

Haven't found an open issue on this, so here goes:

I often find myself wanting to interrupt a long computation. E.g. I run a long optimization task with a large number of max iterations, but it becomes apparent that this will converge very slowly or oscillate and will never end. At that point my only option is to either come back the next day, or interrupt the computation with C-c, in which case the result is lost. What I would like is to be able to put some code in the optimizer to listen to a user input and, if raised, stop the computation at that point and return the current state, so I can use it.

I don't know if it's common practice, but what emacs does is to listen to SIGUSR2 to drop into the debugger. Can such a functionality be made available to user code in Julia? Then it could be used in packages (e.g. break in Gallium, interrupt a computation in Optim...)

@yuyichao
Copy link
Contributor

Dup of #14675, SIGINT is already catchable when exit_on_sigint is set to 0.

ccall(:jl_exit_on_sigint, Void, (Cint,), 0)

@antoine-levitt
Copy link
Contributor Author

That's about catching SIGINT, what I want to do is catch SIGUSR2.

@yuyichao
Copy link
Contributor

Not sure what you mean but the linked issue isn't SIGINT only and

interrupt the computation with C-c, in which case the result is lost. What I would like is to be able to put some code in the optimizer to listen to a user input and, if raised, stop the computation at that point and return the current state, so I can use it.

Is doable with SIGINT.

@antoine-levitt
Copy link
Contributor Author

The point is that SIGINT should be "stop this code now", while SIGUSR2 has more flexibility. You're right that it is covered in the linked issue though, sorry.

@yuyichao
Copy link
Contributor

yuyichao commented Jul 20, 2017

SIGINT should be "stop this code now"

Call it a standard abuse if you like but I think there are a lot of code out there that treat SIGINT as "do clean up (including saving data) and exit". If you really want to stop it now, there's SIGTERM and SIGKILL.

@sbromberger
Copy link
Contributor

OK, so I'm still not sure how to catch a signal (whether SIGINT or something else). Could someone please provide instructions?

What I want to do is to catch a signal to print out debugging information (sort of like how dd handles SIGINFO) within a long-running function.

@iblislin
Copy link
Member

iblislin commented Aug 13, 2017

@sbromberger

└─[iblis@abeing]% g df  | cat
diff --git a/src/signals-unix.c b/src/signals-unix.c
index 2b69f43d6a..eb54d5a051 100644
--- a/src/signals-unix.c
+++ b/src/signals-unix.c
@@ -640,6 +640,12 @@ static void *signal_listener(void *arg)
                 jl_exit_thread0(128 + sig);
             }
         }
+#ifdef SIGINFO
+        if (SIGINFO == sig)
+        {
+            jl_safe_printf("I'm fine.\n");
+        }
+#endif  /* ifdef SIGINFO */
     }
     return NULL;
 }
julia> while true                   
       end                          

load: 0.79  cmd: julia 64121 [running] 4.82r 1.13u 0.24s 10% 193020k     

signal (29): Information request    
while loading no file, in expression starting on line 0                  
anonymous at ./<missing>:0          
jl_call_fptr_internal at /usr/home/iblis/git/julia/src/./julia_internal.h:366 [inlined]                                                            
jl_call_method_internal at /usr/home/iblis/git/julia/src/./julia_internal.h:385 [inlined]                                                          
jl_toplevel_eval_flex at /usr/home/iblis/git/julia/src/toplevel.c:620    
jl_toplevel_eval_in at /usr/home/iblis/git/julia/src/builtins.c:505      
eval at ./repl/REPL.jl:3            
eval_user_input at ./repl/REPL.jl:69                                     
macro expansion at ./repl/REPL.jl:100 [inlined]                          
#1 at ./event.jl:73                 
jl_apply at /usr/home/iblis/git/julia/src/./julia.h:1447 [inlined]       
start_task at /usr/home/iblis/git/julia/src/task.c:268                   
unknown function (ip: 0xffffffffffffffff)                                
unknown function (ip: 0xffffffffffffffff)                                
Allocations: 337516 (Pool: 337422; Big: 94); GC: 0                       
I'm fine.                           
^C^C^C^C^C^CWARNING: Force throwing a SIGINT                             
^C^C^C^C^C^C^C^C^C^C^CERROR: ^C^C^C^CInterruptException:                 
Stacktrace:                         
 [1] anonymous at ./<missing>:0

happy hacking :)

@sbromberger
Copy link
Contributor

@iblis17 Thanks. I'd really like to see signal handling built in to Julia, though. Especially for HPC work, there is a need for checkpointing / graceful shutdown when job resource limits expire, prior to the system forcibly terminating the job. Ideally, I'd like to be able to have in my code, "If you get a SIGUSR2 (for example), write the current output to disk and stop work". This would allow me to configure my HPC job to send SIGUSR2 60 seconds before the system kills it.

@StefanKarpinski
Copy link
Member

It would be great to have reliable, simple signal handling in Julia to the extent that systems allow it. That's a pretty big piece of design and implementation work, however, so it will likely have to be a post-1.0 feature. Fortunately, no issue with adding such an API in 1.1 or 1.2.

@iblislin
Copy link
Member

iblislin commented Aug 25, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants