https://mpitutorial.com/tutorials/performing-parallel-rank-with-mpi/
Parallel rank - problem overview
When processes all have a single number stored in their local memory, it can be useful to know what order their number is in respect to the entire set of numbers contained by all processes.
For example, a user might be benchmarking the processors in an MPI cluster and want to know the order of how fast each processor is relative to the others. This information can be used for scheduling tasks and so on.
Parallel rank API definition
Our function needs to take a number on each process and return its associated rank
our prototype for the rank function looks like this:
TMPI_Rank(
void *send_data,
void *recv_data,
MPI_Datatype datatype,
MPI_Comm comm)
TMPI_Rank
takes a send_data
buffer that contains one number of datatype
type. The recv_data
receives exactly one integer on each process that contains the rank value for send_data
. The comm
variable is the communicator in which ranking is taking place.
Note - The MPI standard explicitly says that users should not name their own functions
MPI_<something>
to avoid confusing user functions with functions in the MPI standard itself. Thus, we will prefix functions in these tutorials withT
.
Solving the parallel rank problem
The easiest way: gathers the numbers to the root process, sorts the numbers to determine their ranks, and then scatters the ranks back to the requesting processes
In the example code (tmpi_rank.c), the gather_numbers_to_root
function is responsible for gathering all of the numbers to the root process. 每个process都会call这个函数,函数参数void*number是要发送给root的数字
// Gathers numbers for TMPI_Rank to process zero. Allocates space for
// the MPI datatype and returns a void * buffer to process 0.
// It returns NULL to all other processes.
void *gather_numbers_to_root(void *number, MPI_Datatype datatype,
MPI_Comm comm) {
int comm_rank, comm_size;
MPI_Comm_rank(comm, &comm_rank);
MPI_Comm_size(comm, &comm_size);
// Allocate an array on the root process of a size depending
// on the MPI datatype being used.
int datatype_size;
MPI_Type_size(datatype, &datatype_size);
void *gathered_numbers;
if (comm_rank == 0) {
gathered_numbers = malloc(datatype_size * comm_size);
}
// Gather all of the numbers on the root process
MPI_Gather(number, 1, datatype, gathered_numbers, 1,
datatype, 0, comm);
return gathered_numbers;
}
Sorting numbers and maintaining ownership
attaching the owning process to the numbers
// Holds the communicator rank of a process along with the
// corresponding number. This struct is used for sorting
// the values and keeping the owning process information
// intact.
typedef struct {
int comm_rank;
union {
float f;
int i;
} number;
} CommRankNumber;
The CommRankNumber
struct holds the number we are going to sort (remember that it can be a float or an int, so we use a union) and it holds the communicator rank of the process that owns the number.
// This function sorts the gathered numbers on the root process and
// returns an array of ordered by the process's rank in its
// communicator. Note - this function is only executed on the root
// process.
int *get_ranks(void *gathered_numbers, int gathered_number_count,
MPI_Datatype datatype) {
int datatype_size;
MPI_Type_size(datatype, &datatype_size);
// Convert the gathered number array to an array of CommRankNumbers.
// This allows us to sort the numbers and also keep the information
// of the processes that own the numbers intact.
CommRankNumber *comm_rank_numbers = malloc(
gathered_number_count * sizeof(CommRankNumber));
int i;
for (i = 0; i < gathered_number_count; i++) {
comm_rank_numbers[i].comm_rank = i;
memcpy(&(comm_rank_numbers[i].number),
gathered_numbers + (i * datatype_size),
datatype_size);
}
// Sort the comm rank numbers based on the datatype
if (datatype == MPI_FLOAT) {
qsort(comm_rank_numbers, gathered_number_count,
sizeof(CommRankNumber), &compare_float_comm_rank_number);
} else {
qsort(comm_rank_numbers, gathered_number_count,
sizeof(CommRankNumber), &compare_int_comm_rank_number);
}
// Now that the comm_rank_numbers are sorted, make an array of rank
// values for each process. The ith element of this array contains
// the rank value for the number sent by process i.
int *ranks = (int *)malloc(sizeof(int) * gathered_number_count);
for (i = 0; i < gathered_number_count; i++) {
ranks[comm_rank_numbers[i].comm_rank] = i;
}
// Clean up and return the rank array
free(comm_rank_numbers);
return ranks;
}
After the numbers are sorted, we must create an array of ranks in the proper order so that they can be scattered back to the requesting processes.
Putting it all together
注意,MPI_Scatter和MPI_Gather都是collective call,只有一个communicator中所有process都call MPI_Scatter(/MPI_Gather)之后每个process才可能结束MPI_Scatter(/MPI_Gather)的执行
// Gets the rank of the recv_data, which is of type datatype. The rank
// is returned in send_data and is of type datatype.
int TMPI_Rank(void *send_data, void *recv_data, MPI_Datatype datatype,
MPI_Comm comm) {
// Check base cases first - Only support MPI_INT and MPI_FLOAT for
// this function.
if (datatype != MPI_INT && datatype != MPI_FLOAT) {
return MPI_ERR_TYPE;
}
int comm_size, comm_rank;
MPI_Comm_size(comm, &comm_size);
MPI_Comm_rank(comm, &comm_rank);
// To calculate the rank, we must gather the numbers to one
// process, sort the numbers, and then scatter the resulting rank
// values. Start by gathering the numbers on process 0 of comm.
void *gathered_numbers = gather_numbers_to_root(send_data, datatype,
comm);
// Get the ranks of each process
int *ranks = NULL;
if (comm_rank == 0) {
ranks = get_ranks(gathered_numbers, comm_size, datatype);
}
// Scatter the rank results
MPI_Scatter(ranks, 1, MPI_INT, recv_data, 1, MPI_INT, 0, comm);
// Do clean up
if (comm_rank == 0) {
free(gathered_numbers);
free(ranks);
}
}
每个process都call这个函数
int main(int argc, char** argv) {
MPI_Init(NULL, NULL);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Seed the random number generator to get different results each time
srand(time(NULL) * world_rank);
float rand_num = rand() / (float)RAND_MAX;
int rank;
TMPI_Rank(&rand_num, &rank, MPI_FLOAT, MPI_COMM_WORLD);
printf("Rank for %f on process %d - %d\n", rand_num, world_rank, rank);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}