Jonathan Olmsted

OpenMPI + InfiniBand

| Comments

This post has been edited and the updates are noted below.

I’ve been working on a periodic rebuilding of the Beowulf cluster that we use in the star lab. I stumbled upon a strange message in setting up the OpenMPI implementation of MPI. In case it matters, but I can’t imagine it will, the cluster is being developed on Fedora Core 16.

OpenMPI Warning Message

I kept running into a (to—me cryptic) warning after getting MPI up and running. Whenever an MPI program would run, I’d get output like following printed to the shell:

librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--------------------------------------------------------------------------
[[14865,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
Host: someserver

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------

where I’ve substittued someserver for the actual server name for the purposes of this post. The execution works, but at the least, whatever was going on seemed to be slowing that down. Moreover, I was fairly sure that this output would cause constant confusion for the users once the cluster went live.

UPDATE (Mon Oct 1 2012): It turns out this message isn’t printed in newer versions of OpenMPI (e.g. >=1.6.2) if the hardware is absent. Although I am constrained to using an older version, this is good to note. Thanks to Jeff Squyres for pointing this out.

It turns out that the “problem” and its “solution” are both very simple, but the information I found online wasn’t that good. This is just my humble attempt to improve the signal to noise ratio.

Problem

OpenMPI is searching the hardware on the nodes for InfiniBand, and, upon failing to find any, falls back to standard interfaces. Of course, this is why execution of the =mpirun= command was working.

Solution

We can tell mpirun to never even check for InfiniBand hardware (and there preventing an unsuccessful search) with the inclusion of -mca btl ^openib on the command line. Specifically, a command that originally was submitted as

1
mpirun -np 3 -hostfile ../mpihosts helloworld

should now be

1
mpirun -np 3 -mca btl ^openib -hostfile ../mpihosts helloworld

That is all there is to it.

UPDATE (Mon Oct 1 2012): Initially, I couldn’t find mention of a config file to make this change (in my case) site-wide. Unsurprisingly, this is a result of me not finding it and not the absence of this functionality. Again, thanks to Jeff Squyres for pointing me here which covers this topic. This is now the approach used for this project.

Caveat

This warning can still pop up if you use some other software which wraps OpenMPI and doesn’t have this flag built in (which is not likely). One example is the R package Rmpi. The testing of the package which occurs as install-time in R includes the running of an MPI job. Because this job isn’t run with the flag which disables the search for InfiniBand hardware, you will see the warning print to the shell. However, normal use of Rmpi isn’t affected, since users run R scripts from within the MPI framework and can include this flag.

UPDATE (Mon Oct 1 2012): See the preceding update. This message won’t pop up after using the site-wide config.