concepts
MPI uses the notion of process rather than processor. Program copies are mapped to processors by the MPI runtime.
Each process has its own rank(用于在程序中indentify这是哪个process), the total number of processes in the world, and the ability to communicate between them either with point-to-point (send/receive) communication, or by collective communication among the group
Derived data types 定义传递数据的数据类型
Many MPI functions require that you specify the type of data which is sent between processes. This is because MPI aims to support heterogeneous environments where types might be represented differently on the different nodes[16] (for example they might be running different CPU architectures that have different endianness), in which case MPI implementations can perform data conversion.[16] Since the C language does not allow a type itself to be passed as a parameter, MPI predefines the constants MPI_INT
, MPI_CHAR
, MPI_DOUBLE
to correspond with int
, char
, double
, etc.
Here is an example in C that passes arrays of int
s from all processes to one. The one receiving process is called the “root” process, and it can be any designated process but normally it will be process 0. All the processes ask to send their arrays to the root with MPI_Gather
, which is equivalent to having each process (including the root itself) call MPI_Send
and the root make the corresponding number of ordered MPI_Recv
calls to assemble all of these arrays into a larger one:[17]
int send_array[100];
int root = 0; /* or whatever */
int num_procs, *recv_array;
MPI_Comm_size(comm, &num_procs);
recv_array = malloc(num_procs * sizeof(send_array));
MPI_Gather(send_array, sizeof(send_array) / sizeof(*send_array), MPI_INT,
recv_array, sizeof(send_array) / sizeof(*send_array), MPI_INT,
root, comm);
However, you may instead wish to send data as one block as opposed to 100 int
s. To do this define a “contiguous block” derived data type:
MPI_Datatype newtype;
MPI_Type_contiguous(100, MPI_INT, &newtype);
MPI_Type_commit(&newtype);
MPI_Gather(array, 1, newtype, receive_array, 1, newtype, root, comm);
For passing a class or a data structure, MPI_Type_create_struct
creates an MPI derived data type from MPI_predefined
data types, as follows:
int MPI_Type_create_struct(int count,
int *blocklen,
MPI_Aint *disp,
MPI_Datatype *type,
MPI_Datatype *newtype)
where:
count
is a number of blocks, and specifies the length (in elements) of the arraysblocklen
,disp
, andtype
.blocklen
contains numbers of elements in each block,disp
contains byte displacements of each block,type
contains types of element in each block.newtype
(an output) contains the new derived type created by this function
The disp
(displacements) array is needed for data structure alignment, since the compiler may pad the variables in a class or data structure. The safest way to find the distance between different fields is by obtaining their addresses in memory. This is done with MPI_Get_address
, which is normally the same as C’s &
operator but that might not be true when dealing with memory segmentation.[18]
Passing a data structure as one block is significantly faster than passing one item at a time, especially if the operation is to be repeated. This is because fixed-size blocks do not require serialization during transfer.[19]
Given the following data structures:
struct A {
int f;
short p;
};
struct B {
struct A a;
int pp, vp;
};
Here’s the C code for building an MPI-derived data type:
static const int blocklen[] = {1, 1, 1, 1};
static const MPI_Aint disp[] = {
offsetof(struct B, a) + offsetof(struct A, f),
offsetof(struct B, a) + offsetof(struct A, p),
offsetof(struct B, pp),
offsetof(struct B, vp)
};
static MPI_Datatype type[] = {MPI_INT, MPI_SHORT, MPI_INT, MPI_INT};
MPI_Datatype newtype;
MPI_Type_create_struct(sizeof(type) / sizeof(*type), blocklen, disp, type, &newtype);
MPI_Type_commit(&newtype);
example c code
In this example, we send a “hello” message to each processor, manipulate it trivially, return the results to the main process, and print the messages.
/*
"Hello World" MPI Test Program
*/
#include <assert.h>
#include <stdio.h>
#include <string.h>
#include <mpi.h>
int main(int argc, char **argv)
{
char buf[256];
int my_rank, num_procs;
/* Initialize the infrastructure necessary for communication */
MPI_Init(&argc, &argv);
/* Identify this process */
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
/* Find out how many total processes are active */
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
/* Until this point, all programs have been doing exactly the same.
Here, we check the rank to distinguish the roles of the programs */
if (my_rank == 0) {
int other_rank;
printf("We have %i processes.\n", num_procs);
/* Send messages to all other processes */
for (other_rank = 1; other_rank < num_procs; other_rank++)
{
sprintf(buf, "Hello %i!", other_rank);
MPI_Send(buf, 256, MPI_CHAR, other_rank,
0, MPI_COMM_WORLD);
}
/* Receive messages from all other processes */
for (other_rank = 1; other_rank < num_procs; other_rank++)
{
MPI_Recv(buf, 256, MPI_CHAR, other_rank,
0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("%s\n", buf);
}
} else {
/* Receive message from process #0 */
MPI_Recv(buf, 256, MPI_CHAR, 0,
0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
assert(memcmp(buf, "Hello ", 6) == 0);
/* Send message to process #0 */
sprintf(buf, "Process %i reporting for duty.", my_rank);
MPI_Send(buf, 256, MPI_CHAR, 0,
0, MPI_COMM_WORLD);
}
/* Tear down the communication infrastructure */
MPI_Finalize();
return 0;
}
When run with 4 processes, it should produce the following output:[47]
$ mpicc example.c && mpiexec -n 4 ./a.out
We have 4 processes.
Process 1 reporting for duty.
Process 2 reporting for duty.
Process 3 reporting for duty.
Here, mpiexec
is a command used to execute the example program with 4 processes, each of which is an independent instance of the program at run time and assigned ranks (i.e. numeric IDs) 0, 1, 2, and 3. The name mpiexec
is recommended by the MPI standard, although some implementations provide a similar command under the name mpirun
. The MPI_COMM_WORLD
is the communicator that consists of all the processes.