DDT Offline debugging for batch jobs that crash or hang
Using DDT in offline node is fairly straightforward. DDT will generate a backtrace and also show the local variables for the current routine. Follow these steps in your job script to get DDTs offline mode to work with a program that crashes (as an alternative to Cray's ATP). You could also use this technique with a hanging job by sending it a SIGABRT or SIGSEGV via qsig or apkill .
Scenario 1: ddt -offline with a MPI cpu-only code
After the program runs, you can copy the html file back to your local machine or view it on a login node with firefox. The equivalent text file report is also shown:
Scenario 2: ddt -offline with MPI cpu+gpu code
With ddt version 4.2.2 or later, the -offline option supports gpu debugging. The following launch command was used in the batch script to trace a kernel invocation and variable on the gpu. A snapshot of the resulting .html output follows.