Signal Handling (UNDER CONSTRUCTION)
You may now advise the batch environment that you would like a signal sent to your application at the walltime limit. Use the -K flag to request a SIGTERM for your job and an integer number of seconds to be allowed beyond the walltime limit. Because the default action for SIGTERM is to terminate a process, each process in your workflow should trap and handle the signal. For starters, the batch script needs to trap it so that the shell keeps running your job script after SIGTERM. aprun will deliver SIGTERM to your application and if you choose to handle it, you have the opportunity to run any code or routine you specify.
One can use signal handling along with the job preemption (#PBS -l flags=preemptee) or using a flexible wallclock start (#PBS -l minwclimit=[mintime] -l walltime=[maxtime]) is an efficient way to get a charge discount (see https://bluewaters.ncsa.illinois.edu/manage-news/-/blogs/charge-factor-discounts-for-jobs-on-blue-waters) and minimize data loss.
Here is the typical jobscript showing the signal trapping for the shell (bash ).
Your application code may also be written to handle SIGTERM as well.
This is a C code example. For Fortran, we would suggest wrapping the signal() C system call named with a trailing underscore (_) and calling C from Fortran.
Now upon receipt of SIGTERM from the batch environment at walltime limit, the application will run your mysig() function and continue beyond the walltime limit for -K seconds. Possible actions of mysig() might include: attempt to execute the application checkpoint routine, close open files or similar cleanup activities. The sample code displays a message to stdout and continues with the main loop iterations.
- man 2 signal
- man 7 signal
- PThreads Programming: A POSIX Standard for Better Multiprocessing for information about customized signal handling with threads
- http://www.cae.tntech.edu/help/programming/mixed_languages for calling C from Fortran examples and additional information.