lamtrace - Unload LAM trace data.
SYNTAX
lamtrace [-hkvR] [-mpi] [-l <listno>] [-f <#secs>]
[<filename>] [<nodes>] [<processes>]
OPTIONS
-h Print useful information on this command.
-k Copy and do not remove trace data.
-v Be verbose.
-R Delete all trace data from the specified
nodes.
-l Unload only from the given list number.
-mpi Unload trace data for an MPI application.
-f <#secs> Signal target processes to flush trace data
to the daemon. Then wait <#secs> before un
loading.
<filename> Place trace data into this file (default:
def.lamtr).
DESCRIPTION
The -t option of mpirun(1) and loadgo(1) allows the appli
cation to generate execution traces. These traces are
first stored in a buffer within each application process.
When the buffer is full and when the application termi
nates, the runtime buffer is flushed to the trace daemon
(a structural component within the LAM daemon). The trace
daemon will also collect data up to a pre-compiled limit.
Beyond this limit, the oldest traces in will be forgotten
in favor of the newer traces.
After an application has finished, the record of its exe
cution is stored in the trace daemons of each node that
was running the application. The lamtrace command can be
used to retrieve these traces and store them in one file
for display by a performance visualization tool, such as
xmpi(1). If the application was started by xmpi(1), lam
trace is not normally needed as the equivalent functional
ity is invoked with a button.
Incomplete trace data can be unloaded while the applica
tion is running. The output file must not exist prior to
invoking lamtrace. This is a good situation to use the -k
option, which preserves the trace daemon's contents after
unloading. Each reload will then get the entire run's
trace data in an internal buffer. A standard LAM signal,
LAM_SIGTRACE (see doom(1)), causes trace enabled processes
to flush the internal trace buffer to the daemon. The -f
option tells lamtrace to send this signal to all target
processes before unloading trace data. A race condition
develops between the target process storing trace data to
the daemon and the unloading procedure. The problem is
foisted upon the user who gives a delay parameter after
-f.
Trace data are organized by node, process identifier and
list number. A process can store traces on any node, al
though the local node is the obvious, least intrusive
choice. The process can identify itself in any meaningful
way (getpid(2) is a good idea) The list number is also
chosen by the process. These values may be set by an in
strumented library, such as libmpi(3), or directly by the
application with lam_rtrstore(2). Unloading flexibility
follows that of storing with the -l option selecting the
list number, and standard LAM command line mnemonics se
lecting nodes and processes.
Dropping old traces when a pre-compiled volume limit is
reached only happens for positive list numbers. Traces in
negatively numbered lists will be collected until the un
derlying system runs out of memory. Do not use negative
list numbers for high volume trace data.
If no process selection is given on the command line,
trace data will be unloaded for all processes on each
specified node.
LAM, its trace daemon and lamtrace are all unaware of the
format and meaning of traces.
The -R option does not unload trace data. It causes the
target trace daemons to free the memory occupied by trace
data in the given list. If all lists are specified (no -l
option), the trace daemon is effectively reset to its
state after initiating LAM.
Unloading MPI Trace Data
A special capability, selected by the -mpi option, exists
to search for and unload only the trace data generated by
an MPI application. For this purpose, lamtrace is aware
of the particular reserved list numbers that libmpi(3) us
es to store traces. It begins by searching all specified
nodes and processes (the whole LAM multicomputer, if noth
ing is specified) for a special trace generated by process
rank 0 in MPI_COMM_WORLD of an MPI application. This spe
cial trace contains the node and process identifiers of
all processes in that MPI_COMM_WORLD communicator. lam
If multiple world communicators exist within LAM's trace
daemons, the first one found is used. Multiple worlds may
be present due to multiple concurrent applications, trace
data from a previous run not removed (either with lamtrace
or lamclean(1)), or an application that spawns processes.
A particular world communicator can be located by provid
ing precise node and process location to lamtrace.
The -mpi option is not compatible with the -l option.
EXAMPLES
lamtrace -v -mpi mytraces
Unload trace data into the file "mytraces" from the
first MPI application found in a search of the entire
LAM multicomputer. Report on important steps as they
are done.
lamtrace n30 -l 5 p21367
Unload trace data from list 5 of process ID 21367 on
node 30. Operate silently.
lamtrace -mpi n30 p21367
Unload trace data from the MPI application world group
whose process rank 0 has PID 21367 and is/was running
on node 30.
BUGS
Since trace data can be unloaded during an application's
execution, there should be a way to incrementally append
to an output file. This is a bit tricky with -mpi, but it
can be done.
FILES
def.lamtr default output file
SEE ALSO
mpirun(1), loadgo(1), lam_rtrstore(1), lamclean(1),
libmpi(3), xmpi(1)