man mpirun
术语
DEFINITION OF ‘SLOT’
A slot is an allocation unit for a process. The number of slots on a node indicate how many processes can potentially execute on that node. By default, Open MPI will allow one process per slot.
Slots are not hardware resources
DEFINITION OF ‘PROCESSOR ELEMENT’
By default, Open MPI defines that a “processing element” is a processor core.
mpirun发生了什么
% mpirun [ -np X ] [ –hostfile
This will run X copies of
in your current run-time environment (if running under a supported resource manager, Open MPI’s mpirun will usually automatically use the corresponding resource manager process starter, as opposed to, for example, rsh or ssh, which require the use of a hostfile, or will default to running all X copies on the localhost), scheduling (by default) in a round-robin fashion by CPU slot. mpirun will send the name of the directory where it was invoked on the local node to each of the remote nodes, and attempt to change to that directory.
Pass these run-time arguments to every new process. These must always be the last arguments to mpirun. Note that as of the start of the v1.8 release, mpirun will launch a daemon onto each host in the allocation (比如使用–host指定的所有节点) at the very beginning of execution, regardless of whether or not application processes will eventually be mapped to execute there.
map和bind
Open MPI employs a three-phase procedure for assigning process locations and ranks:
mapping Assigns a default location to each process
ranking Assigns an MPI_COMM_WORLD rank value to each process
binding Constrains each process to run on specific processors
Note: the location assigned to the process(即mapping) is independent of where it will be bound - the assignment is used solely as input to the binding algorithm.
process bind
Please note that mpirun automatically binds processes as of the start of the v1.8 series. Three binding patterns are used in the absence of any further directives:
Bind to core: when the number of processes is <= 2
Bind to socket: when the number of processes is > 2
Bind to none: when oversubscribed
If your application uses threads, then you probably want to ensure that you are either not bound at all (by specifying –bind-to none), or bound to multiple cores using an appropriate binding level
or specific number of processing elements per application process
.
(即每个process多线程的时候,要么指定**--bind-to none
**,这样会not bound (or bound to all available processors
;要么指定每个process分配多少处理器核心,比如--map-by node:PE=n
是每个节点一个process、每个process bind to n个处理器核心)
相关option
运行指定:
–bind-to
Bind processes to the specified object, defaults to core. Supported options include slot, hwthread, core, l1cache, l2cache, l3cache, socket, numa, board, cpu-list, and none.–map-by
Map process to the specified object, defaults to socket. Supported options include slot, hwthread,
core, L1cache, L2cache, L3cache, socket, numa, board, node, sequential, distance, and ppr.
Any object can include modifiers by adding a : and any combination of PE=n (bind n processing elements to each proc), SPAN (load balance the processes across the allocation), OVER‐
SUBSCRIBE (allow more processes on a node than processing elements), and NOOVERSUBSCRIBE.
This includes PPR, where the pattern would be terminated by another colon to separate it
from the modifiers.
> 比如 –map-by node:PE=n
> load balance the processes across the available nodes, and bind each process to 32 processing elements.–use-hwthread-cpus
then **processing element** is not physical core, but hardware thread
帮助监控:
- -report-bindings, –report-bindings
Report any bindings for launched processes.
option
运行指定
The program executable. This is identified as the first non-recognized argument to mpirun.
The following options specify the number of processes to launch. Note that none of the options imply a particular binding policy
- -c, -n, –n, -np <#>
Run this many copies of the program on the given nodes - 还有很多其他参数,需要的时候man mpirun
stdI/O控制
- -output-filename, –output-filename
Redirect the stdout, stderr, and stddiag of all processes to a process-unique version of
the specified filename. Any directories in the filename will automatically be created.
Each output file will consist of filename.id, where the id will be the processes’ rank in
MPI_COMM_WORLD, left-filled with zero’s for correct ordering in listings. A relative path
value will be converted to an absolute path based on the cwd where mpirun is executed. Note
that this will not work on environments where the file system on compute nodes differs from
that where mpirun is executed. - -tag-output, –tag-output
Tag each line of output to stdout, stderr, and stddiag with [jobid, MCW_rank]indi‐
cating the process jobid and MPI_COMM_WORLD rank of the process that generated the output,
and the channel which generated it. - -timestamp-output, –timestamp-output
Timestamp each line of output to stdout, stderr, and stddiag. - -stdin, –stdin
The MPI_COMM_WORLD rank of the process that is to receive stdin. The default is to forward
stdin to MPI_COMM_WORLD rank 0, but this option can be used to forward stdin to any
process. It is also acceptable to specify none, indicating that no processes are to receive
stdin.
帮助监控
-display-map, –display-map
Display a table showing the mapped location of each process prior to launch.-display-allocation, –display-allocation
Display the detected resource allocation.-report-pid, –report-pid
Print out mpirun’s PID during startup. The channel must be either a ‘-‘ to indicate that
the pid is to be output to stdout, a ‘+’ to indicate that the pid is to be output to
stderr, or a filename to which the pid is to be written.-show-progress, –show-progress
Output a brief periodic report on launch progress
调试
-debug, –debug
Invoke the user-level debugger indicated by the orte_base_user_debugger MCA parameter.-debugger, –debugger
Sequence of debuggers to search for when –debug is used (i.e. a synonym for
orte_base_user_debugger MCA parameter).
比如
mpirun -debug -debugger gdb ./myprogram
-debugger gdb
指明可以使用的调试器
bind process
见上方[bind process节](# 相关option)
mca
- -mca, –mca
Send arguments to various MCA modules. See the “MCA” section, below.