mpirun - Run MPI programs on LAM nodes.
SYNTAX
mpirun [-fhvO] [-c <#> | -np <#>] [-D | -wd <dir>] [-ger |
-nger] [-c2c | -lamd] [-nsigs] [-nw | -w] [-nx]
[-pty] [-s <node>] [-t | -toff | -ton] [-x
VAR1[=VALUE1][,VAR2[=VALUE2],...]] [<nodes>]
<program> [-- <args>]
mpirun [-fhvO] [-D | -wd <dir>] [-ger | -nger] [-lamd |
-c2c] [-nsigs] [-nw | -w] [-nx] [-pty] [-t | -toff
| -ton] [-x VAR1[=VALUE1][,VAR2[=VALUE2],...]]
<schema>
OPTIONS
There are two forms of the mpirun command -- one for pro
grams (i.e., SPMD-style applications), and one for appli
cation schemas (see appschema(5)). Both forms of mpirun
use the following options by default: -c2c -nger -w.
These may each be overriden by their counterpart options,
described below.
Additionally, mpirun will send the name of the directory
where it was invoked on the local node to each of the re
mote nodes, and attempt to change to that directory. See
the "Current Working Directory" section, below.
-c <#> Synonym for -np (see below).
-c2c Use "client to client" (c2c) mode for MPI commu
nication in the user program. This mode can
significantly speed up some applications, as
messages will be passed directly from the source
rank to the destination rank; the LAM daemons
will not be used as third-party message passing
agents. However, this disables monitoring and
debugging capabilities; see MPI(7). This option
is mutually exclusive with -lamd.
-D Use the executable program location as the cur
rent working directory for created processes.
The current working directory of the created
processes will be set before the user's program
is invoked. This option is mutually exclusive
with -wd.
-f Do not configure standard I/O file descriptors -
use defaults.
-h Print useful information on this command.
MPI(7) for a description of GER. This option is
mutually exclusive with -nger.
-lamd Use the LAM "daemon mode" for MPI communication.
See -c2c (above) and MPI(7) for a description of
the "daemon mode" communication.
-nger Disable GER (Guaranteed Envelope Resources).
This option is mutually exclusive with -ger.
-nsigs Do not have LAM catch signals.
-np <#> Run this many copies of the program on the given
nodes. This option indicates that the specified
file is an executable program and not an appli
cation schema. If no nodes are specified, all
LAM nodes are considered for scheduling; LAM
will schedule the programs in a round-robin
fashion, "wrapping around" (and scheduling mul
tiple copies on a single node) if necessary.
-nw Do not wait for all processes to complete before
exiting mpirun. This option is mutually exclu
sive with -w.
-nx Do not automatically export LAM_MPI_* environ
ment variables to the remote nodes.
-O Multicomputer is homogeneous. Do no data con
version when passing messages.
-pty Enable pseudo-tty support. Among other things,
this enabled line-buffered output (which is
probably what you want). The only reason that
this feature is not enabled by default is be
cause it is so new and has not been extensively
tested yet.
-s <node> Load the program from this node. This option is
not valid on the command line if an application
schema is specified.
-t, -ton Enable execution trace generation for all pro
cesses. Trace generation will proceed with no
further action. These options are mutually ex
clusive with -toff.
-toff Enable execution trace generation for all pro
cesses. Trace generation will begin after pro
cesses collectively call MPIL_Trace_on(2). This
option is mutually exclusive with -t and -ton.
-w Wait for all applications to exit before mpirun
exits.
-wd <dir> Change to the directory <dir> before the user's
program executes. Note that if the -wd option
appears both on the command line and in an ap
plication schema, the schema will take precen
dence over the command line. This option is mu
tually exclusive with -D.
-x Export the specified environment variables to
the remote nodes before executing the program.
Existing environment variables can be specified
(see the Examples section, below), or new vari
able names specified with corresponding values.
The parser for the -x option is not very sophis
ticated; it does not even understand quoted val
ues. Users are advised to set variables in the
environment, and then use -x to export (not de
fine) them.
-- <args> Pass these runtime arguments to every new pro
cess. This must always be the last argument to
mpirun. This option is not valid on the command
line if an application schema is specified.
DESCRIPTION
One invocation of mpirun starts an MPI application running
under LAM. If the application is simply SPMD, the appli
cation can be specified on the mpirun command line. If
the application is MIMD, comprising multiple programs, an
application schema is required in a separate file. See
appschema(5) for a description of the application schema
syntax, but it essentially contains multiple mpirun com
mand lines, less the command name itself. The ability to
specify different options for different instantiations of
a program is another reason to use an application schema.
Application Schema or Executable Program?
To distinguish the two different forms, mpirun looks on
the command line for <nodes> or the -c option. If neither
is specified, then the file named on the command line is
assumed to be an application schema. If either one or
both are specified, then the file is assumed to be an exe
cutable program. If <nodes> and -c both are specified,
then copies of the program are started on the specified
nodes according to an internal LAM scheduling policy.
Specifying just one node effectively forces LAM to run all
copies of the program in one place. If -c is given, but
not <nodes>, then all LAM nodes are used. If <nodes> is
given, but not -c, then one copy of the program is run on
By default, LAM searches for executable programs on the
target node where a particular instantiation will run. If
the file system is not shared, the target nodes are homo
geneous, and the program is frequently recompiled, it can
be convenient to have LAM transfer the program from a
source node (usually the local node) to each target node.
The -s option specifies this behavior and identifies the
single source node.
Locating Files
LAM looks for an executable program by searching the di
rectories in the user's PATH environment variable as de
fined on the source node(s). This behavior is consistent
with logging into the source node and executing the pro
gram from the shell. On remote nodes, the "." path is the
home directory.
LAM looks for an application schema in three directories:
the local directory, the value of the LAMAPPLDIR environ
ment variable, and LAMHOME/boot, where LAMHOME is the LAM
installation directory.
Standard I/O
LAM directs UNIX standard input to /dev/null on all remote
nodes. On the local node that invoked mpirun, standard
input is inherited from mpirun. The default is what used
to be the -w option to prevent conflicting access to the
terminal.
LAM directs UNIX standard output and error to the LAM dae
mon on all remote nodes. LAM ships all captured out
put/error to the node that invoked mpirun and prints it on
the standard output/error of mpirun. Local processes in
herit the standard output/error of mpirun and transfer to
it directly.
Thus it is possible to redirect standard I/O for LAM ap
plications by using the typical shell redirection proce
dure on mpirun.
% mpirun N my_app < my_input > my_output
The -f option avoids all the setup required to support
standard I/O described above. Remote processes are com
pletely directed to /dev/null and local processes inherit
file descriptors from lamboot(1).
Pseudo-tty support
The -pty option enabled pseudo-tty support for process
output. This allows, among other things, for line
buffered output from remote nodes (which is probably what
you want).
cause it has not been thoroughly tested on a variety of
different Unixes. Users are encouraged to use -pty and re
port any problems back to the LAM Team.
Current Working Directory
The default behavior of mpirun has changed with respect to
the directory that processes will be started in.
The -wd option to mpirun allows the user to change to an
arbitrary directory before their program is invoked. It
can also be used in application schema files to specify
working directories on specific nodes and/or for specific
applications.
If the -wd option appears both in a schema file and on the
command line, the schema file directory will override the
command line value.
The -D option will change the current working directory to
the directory where the executable resides. It cannot be
used in application schema files. -wd is mutually exclu
sive with -D.
If neither -wd nor -D are specified, the local node will
send the directory name where mpirun was invoked from to
each of the remote nodes. The remote nodes will then try
to change to that directory. If they fail (e.g., if the
directory does not exists on that node), they will start
with from the user's home directory.
All directory changing occurs before the user's program is
invoked; it does not wait until MPI_INIT is called.
Process Environment
Processes in the MPI application inherit their environment
from the LAM daemon upon the node on which they are run
ning. The environment of a LAM daemon is fixed upon boot
ing of the LAM with lamboot(1) and is inherited from the
user's shell. On the origin node this will be the shell
from which lamboot(1) was invoked and on remote nodes this
will be the shell started by rsh(1). When running dynami
cally linked applications which require the LD_LI
BRARY_PATH environment variable to be set, care must be
taken to ensure that it is correctly set when booting the
LAM.
Exported Environment Variables
All environment variables that are named in the form
LAM_MPI_* will automatically be exported to new processes
on the local and remote nodes. This exporting may be in
hibited with the -nx option.
While the syntax of the -x option allows the definition of
new variables, note that the parser for this option is
currently not very sophisticated - it does not even under
stand quoted values. Users are advised to set variables
in the environment and use -x to export them; not to de
fine them.
Trace Generation
Two switches control trace generation from processes run
ning under LAM and both must be in the on position for
traces to actually be generated. The first switch is con
trolled by mpirun and the second switch is initially set
by mpirun but can be toggled at runtime with
MPIL_Trace_on(2) and MPIL_Trace_off(2). The -t (-ton is
equivalent) and -toff options all turn on the first
switch. Otherwise the first switch is off and calls to
MPIL_Trace_on(2) in the application program are ineffec
tive. The -t option also turns on the second switch. The
-toff option turns off the second switch. See
MPIL_Trace_on(2) and lamtrace(1) for more details.
MPI Data Conversion
LAM's MPI library converts MPI messages from local repre
sentation to LAM representation upon sending them and then
back to local representation upon receiving them. If the
case of a LAM consisting of a homogeneous network of ma
chines where the local representation differs from the LAM
representation this can result in unnecessary conversions.
The -O switch can be used to indicate that the LAM is ho
mogeneous and turn off data conversion.
Direct MPI Communication
For much improved performance but much decreased observ
ability, the -c2c option directs LAM's MPI library to use
the most direct underlying mechanism to communicate with
other processes, rather than use the network message-pass
ing of the LAM daemon. Unreceived messages will be
buffered in the destination process instead of the LAM
daemon. MPI process and message monitoring commands and
tools will be much less effective, usually reporting run
ning processes and empty message queues. Signal delivery
with doom(1) is unaffected.
Guaranteed Envelope Resources
By default, LAM will guarantee a minimum amount of message
envelope buffering to each MPI process pair and will im
pede or report an error to a process that attempts to
overflow this system resource. This robustness and debug
ging feature is implemented in a machine specific manner
when direct communication (-c2c) is used. For normal LAM
communication via the LAM daemon, a protocol is used. The
-nger option disables GER and the measures taken to sup
details.
EXAMPLES
mpirun N prog1
Load and execute prog1 on all nodes. Search for the
executable file on each node.
mpirun -c 8 prog1
Run 8 copies of prog1 wherever LAM wants to run them.
mpirun n8-10 -v -nw -s n3 prog1 -- -q
Load and execute prog1 on nodes 8, 9, and 10. Search
for prog1 on node 3 and transfer it to the three tar
get nodes. Report as each process is created. Give
"-q" as a command line to each new process. Do not
wait for the processes to complete before exiting
mpirun.
mpirun -v myapp
Parse the application schema, myapp, and start all
processes specified in it. Report as each process is
created.
mpirun N N -pty -wd /workstuff/output -x DISPLAY
run_app.csh
Run the application "run_app.csh" (assumedly a C shell
script) twice on each node in the system (ideal for
2-way SMPs). Also enable pseudo-tty support, change
directory to /workstuff/output, and export the DISPLAY
variable to the new processes (perhaps the shell
script will invoke an X application such as xv to dis
play output).
mpirun -np 5 -D `pwd`/my_application
A common usage of mpirun in environments where a
filesystem is shared between all nodes in the multi
computer, using the shell-escaped "pwd" command speci
fies the full name of the executable to run. This
prevents the need for putting the directory in the
path; the remote notes will have an absolute filename
to execute (and change directory to it upon invoca
tion).
DIAGNOSTICS
mpirun: Exec format error
A non-ASCII character was detected in the application
schema. This is usually a command line usage error
where mpirun is expecting an application schema and an
executable file was given.
mpirun: syntax error in application schema, line XXX
The application schema cannot be parsed because of a
This error can occur in two cases. Either the named
file cannot be located or it has been found but the
user does not have sufficient permissions to execute
the program or read the application schema.
SEE ALSO
mpimsg(1), mpitask(1), lamexec(1), lamtrace(1),
MPIL_Trace_on(2), loadgo(1)