MPI · parallel computing · Uncategorized

MPI debugging

MPI programs are more difficult to debug than other programs because of their parallel nature. The MPI programming model follows the single program multiple data paradigm or the multiple program multiple data paradigm. Debugging MPI programs is complicated by the following characteristics:

  • Several processes run in parallel in a coordinated way.
  • These processes may all execute the same binary (single program multiple data) or different binaries (multiple program multiple data).
  • These processes may run on different remote hosts, i.e., they may not run on the host from where the program execution was initiated (and from where the developer would interact with the program via a debugger).
  • The start-up procedure of MPI programs involves a launch-utility such as mpiexec or mpirun, i.e., there is an additional level of indirection.

In order to cope with the special characteristics of the MPI programming model some specialized parallel debugging tools have been developed, e.g., TotalView and DDT. Though, these tools are commercial products and not widely available.

As an alternative one might use a traditional sequential debugger such as GDB. In order to simplify the setup, let’s assume that it is sufficient to run one of the many MPI processes under a debugger. Starting MPI processes under the multiple program multiple data paradigm follows the pattern

mpiexec [ global_options ]
        [ local_options1 ] <program1> [ <args1> ] :
        [ local_options2 ] <program2> [ <args2> ] :
        ... :
        [ local_optionsN ] <programN> [ <argsN> ]

One can utilize mpiexec to run a single process under the control of GDB by specifying gdb as one of the program arguments to mpiexec. The following example starts four instances of hello_world, one of them running under the control of GDB.

user@tron:~/$ mpiexec -n 1 gdb hello_world : -n 3 hello_world 
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from hello_world...

Now one can interact with GDB as usual, e.g., setting a break point and starting the program:

(gdb) break main
Breakpoint 1 at 0xab96: file /home/user/hello_world.cc, line 6.
(gdb) run
Starting program: /home/user/hello_world 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main () at /home/user/hello_world.cc:6
6	int main() {
(gdb) 

The text-based user interface of the GDB is powerful, yet somewhat cumbersome and requires a steep learning curve. Some developers might find it more convenient to use the GDB running in an integrated development environment (IDE) or advanced editor such as CLion or Visual Studio Code. For this debugging scenario, one can replace the gdb in the example above by the gdbserver, e.g.:

user@tron:~/$ mpiexec -n 1 gdbserver localhost:2345 hello_world : -n 3 hello_world 
Process /home/user/hello_world created; pid = 39171
Listening on port 2345

The argument localhost:2345 indicates that the gdbserver will accept connections from localhost on port 2345. (In this example, all processes run on the local host. When running the MPI processes in a distributed setup, the host name must be adjusted accordingly.) Before connecting to the GDB server, let’s test the connection with the GDB text interface:

user@tron:~/$ gdb hello_world 
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from hello_world...

Visual Studio Code and CLion support debugging by connecting to a running GDB server. See the CLion documentation or the documentation for the Visual Studio Code Native Debug plugin for details. Once, everything as been setup as described in the IDEs documentation, one can add a break point and attach to the running GDB server as demonstrated in the following figures.

Debugging an MPI process in CLion via GDB attached to a GDB server.
Debugging an MPI process in Visual Studio Code via GDB attached to a GDB server. The code shown in the two screen shots is based on MPL, a lightweight C++17 wrapper for MPI.

There are many more options for debugging MPI programs, see, for example:

Leave a Reply

Your email address will not be published. Required fields are marked *