Parallel debugging¶
gdb4hpc is a GDB-based parallel debugger used to debug applications. It allows programmers to either launch an application or attach to an already-running application.
This page is not a GDB tutorial, but simply an explanation on how to launch your application with the debugger or attach the debugger to your application. If you want to know more about GBD, see its manual. See also the page in Cray documentation about debugging tools in Cray Programming Environment.
Note
In order to efficiently debug your application, it's recommended to compile
it with the debug flag (-g
).
To have access to gdb4hpc
, load the corresponding module in your environment.
Then, run the debugger.
Launching your application from gdb4hpc¶
gdb4hpc application launch fails
There is an ongoing issue with gdb4hpc
that causes the launch to fail with
the following error:
Failed to launch CTI app.
CTI error: cti_launchAppBarrier: mpiexec was not found in PATH. (tried SSH)
Please export the following environment variable as temporary workaround:
You can launch your application from the debugger command line interface using
the launch
command.
$ dbg all> launch --launcher-args="<launch-args>"
--args="<args>"
--env="<name=value>" <handle> <application>
where launch-args
are the arguments for the launcher (i.e. Slurm options).
You can use this argument to specify the project to bill with
--account=<project>
. The parameters --args
and --env
allows you to pass
parameters and define environment variables for your application. The handle is
a debugger variable array specifying the number of ranks in the application.
For example, an application with an handle of $a{16}
will launch the
application with 16 ranks.
Example debug session
In the example debug session presented here, we launch an MPI hello world application with 16 ranks.
dbg all> launch $a{16} --launcher-args="--account=<project>" ./myapp
Starting application, please wait...
Creating MRNet communication network...
Waiting for debug servers to attach to MRNet communications network...
Timeout in 400 seconds. Please wait for the attach to complete.
Number of dbgsrvs connected: [1]; Timeout Counter: [0]
Number of dbgsrvs connected: [1]; Timeout Counter: [1]
Number of dbgsrvs connected: [16]; Timeout Counter: [0]
Finalizing setup...
Launch complete.
a{0..15}: Initial breakpoint, main at /home/olouant/mpi_hello.c:5
dbg all> list
a{0..15}: 5 MPI_Init(NULL, NULL);
a{0..15}: 6
a{0..15}: 7 int world_size;
a{0..15}: 8 MPI_Comm_size(MPI_COMM_WORLD, &world_size);
a{0..15}: 9
a{0..15}: 10 int world_rank;
a{0..15}: 11 MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
a{0..15}: 12
a{0..15}: 13 printf("Hello world from rank %d out of %d",
a{0..15}: 14 world_rank, world_size);
dbg all> break mpi_hello.c:12
a{0..15}: Breakpoint 1: file /home/olouant/mpi_hello.c, line 12.
dbg all> continue
a{0..15}: Breakpoint 1, main at /home/olouant/mpi_hello.c:12
dbg all> print world_rank
a{0}: 0
a{1}: 1
...
a{14}: 14
a{15}: 15
dbg all> print world_size
a{0..15}: 16
dbg all> quit
Shutting down debugger and killing application for 'a'.
Attach to an already running application¶
gdb4hpc can also attach to an already running application. This is done using
the attach
command.
The <jobstep>
parameter will typically be <jobid>.0
if only one srun
command is present in your job script. If it's not the case, you can list your
job steps with sstat
.
where the <jobid>
can be determined via squeue
. As an example, the debugger
command to attach the debugger to job step 123456.0
will be