Sorry, your browser cannot access this site
This page requires browser support (enable) JavaScript
Learn more >

man mpirun

术语

DEFINITION OF ‘SLOT’

A slot is an allocation unit for a process. The number of slots on a node indicate how many processes can potentially execute on that node. By default, Open MPI will allow one process per slot.

Slots are not hardware resources

DEFINITION OF ‘PROCESSOR ELEMENT’

By default, Open MPI defines that a “processing element” is a processor core.

mpirun发生了什么

% mpirun [ -np X ] [ –hostfile ]

  • This will run X copies of in your current run-time environment (if running under a supported resource manager, Open MPI’s mpirun will usually automatically use the corresponding resource manager process starter, as opposed to, for example, rsh or ssh, which require the use of a hostfile, or will default to running all X copies on the localhost), scheduling (by default) in a round-robin fashion by CPU slot.

  • mpirun will send the name of the directory where it was invoked on the local node to each of the remote nodes, and attempt to change to that directory.

  • Pass these run-time arguments to every new process. These must always be the last arguments to mpirun.

  • Note that as of the start of the v1.8 release, mpirun will launch a daemon onto each host in the allocation (比如使用–host指定的所有节点) at the very beginning of execution, regardless of whether or not application processes will eventually be mapped to execute there.

map和bind

Open MPI employs a three-phase procedure for assigning process locations and ranks:

  • mapping Assigns a default location to each process

  • ranking Assigns an MPI_COMM_WORLD rank value to each process

  • binding Constrains each process to run on specific processors

    Note: the location assigned to the process(即mapping) is independent of where it will be bound - the assignment is used solely as input to the binding algorithm.

process bind

Please note that mpirun automatically binds processes as of the start of the v1.8 series. Three binding patterns are used in the absence of any further directives:

  • Bind to core: when the number of processes is <= 2

  • Bind to socket: when the number of processes is > 2

  • Bind to none: when oversubscribed

If your application uses threads, then you probably want to ensure that you are either not bound at all (by specifying –bind-to none), or bound to multiple cores using an appropriate binding level or specific number of processing elements per application process.

(即每个process多线程的时候,要么指定**--bind-to none**,这样会not bound (or bound to all available processors;要么指定每个process分配多少处理器核心,比如--map-by node:PE=n是每个节点一个process、每个process bind to n个处理器核心)

相关option

运行指定:

  • –bind-to
    Bind processes to the specified object, defaults to core. Supported options include slot, hwthread, core, l1cache, l2cache, l3cache, socket, numa, board, cpu-list, and none.

  • –map-by
    Map process to the specified object, defaults to socket. Supported options include slot, hwthread,
    core, L1cache, L2cache, L3cache, socket, numa, board, node, sequential, distance, and ppr.
    Any object can include modifiers by adding a : and any combination of PE=n (bind n processing elements to each proc), SPAN (load balance the processes across the allocation), OVER‐
    SUBSCRIBE (allow more processes on a node than processing elements), and NOOVERSUBSCRIBE.
    This includes PPR, where the pattern would be terminated by another colon to separate it
    from the modifiers.

    > 比如 –map-by node:PE=n
    > load balance the processes across the available nodes, and bind each process to 32 processing elements.

  • –use-hwthread-cpus

          ​	then **processing element** is not physical core, but hardware thread
    

帮助监控:

  • -report-bindings, –report-bindings
    Report any bindings for launched processes.

option

运行指定

  • The program executable. This is identified as the first non-recognized argument to mpirun.

The following options specify the number of processes to launch. Note that none of the options imply a particular binding policy

  • -c, -n, –n, -np <#>
    Run this many copies of the program on the given nodes
  • 还有很多其他参数,需要的时候man mpirun

stdI/O控制

  • -output-filename, –output-filename
    Redirect the stdout, stderr, and stddiag of all processes to a process-unique version of
    the specified filename. Any directories in the filename will automatically be created.
    Each output file will consist of filename.id, where the id will be the processes’ rank in
    MPI_COMM_WORLD, left-filled with zero’s for correct ordering in listings. A relative path
    value will be converted to an absolute path based on the cwd where mpirun is executed. Note
    that this will not work on environments where the file system on compute nodes differs from
    that where mpirun is executed.
  • -tag-output, –tag-output
    Tag each line of output to stdout, stderr, and stddiag with [jobid, MCW_rank] indi‐
    cating the process jobid and MPI_COMM_WORLD rank of the process that generated the output,
    and the channel which generated it.
  • -timestamp-output, –timestamp-output
    Timestamp each line of output to stdout, stderr, and stddiag.
  • -stdin, –stdin
    The MPI_COMM_WORLD rank of the process that is to receive stdin. The default is to forward
    stdin to MPI_COMM_WORLD rank 0, but this option can be used to forward stdin to any
    process. It is also acceptable to specify none, indicating that no processes are to receive
    stdin.

帮助监控

  • -display-map, –display-map
    Display a table showing the mapped location of each process prior to launch.

  • -display-allocation, –display-allocation
    Display the detected resource allocation.

  • -report-pid, –report-pid
    Print out mpirun’s PID during startup. The channel must be either a ‘-‘ to indicate that
    the pid is to be output to stdout, a ‘+’ to indicate that the pid is to be output to
    stderr, or a filename to which the pid is to be written.

  • -show-progress, –show-progress
    Output a brief periodic report on launch progress

调试

  • -debug, –debug
    Invoke the user-level debugger indicated by the orte_base_user_debugger MCA parameter.

  • -debugger, –debugger
    Sequence of debuggers to search for when –debug is used (i.e. a synonym for
    orte_base_user_debugger MCA parameter).

比如

mpirun -debug -debugger gdb ./myprogram

-debugger gdb指明可以使用的调试器

bind process

见上方[bind process节](# 相关option)

mca

  • -mca, –mca
    Send arguments to various MCA modules. See the “MCA” section, below.

评论