Page tree

Overview

Each node on Gadi has multiple cores spread across a dual socket CPU each with two NUMA nodes. Hyperthreading is enabled on Gadi nodes leading to each physical core corresponding to two virtual cores. The following diagram represents a single Cascade Lake node:


These Cascade Lake nodes are available through the expressnormalcopyqhugememmegamem and gpuvolta queues. The Broadwell (ex-Raijin) and Skylake (ex-Raijin) nodes on Gadi have a different layout. Details of them are available in our queue structure guide.

It is possible to use a hybrid programming model spanning both MPI and OpenMP paradigms to program codes for use on Gadi. 

Using OpenMP compiler directives, one can program for all the cores sharing physical memory on a single node. Coupled with the combined use of MPI it is possible to distribute computation across multiple Gadi nodes. For more information on how to use OpenMP, please refer to training material on http://www.openmp.org/

This page aims to demonstrate how to distribute MPI processes using the mpirun command with both Open MPI and Intel MPI across multiple nodes and some of the available distribution options.

The following examples are tested with openmpi/4.0.2 and intel-mpi/2021.1.1 unless otherwise specified. Older versions of MPI may have different default values (placement by socket, core) and different syntax.

Each example given below displays the MPI process bindings (through use of --report-bindings when Open MPI is used, or by setting the environment variable I_MPI_DEBUG=5 when Intel MPI is used) per socket, core and hardware thread. Open MPI prints the bindings in the job's error file and Intel MPI prints the bindings in the job's output file.

Intel MPI requires an MPI application having MPI_Init() and MPI_Finalise() functions on it. Otherwise, no process binding report will be generated when setting I_MPI_DEBUG to 4 or 5.

To run a job using Intel MPI requires to set the environment variable I_MPI_HYDRA_BRANCH_COUNT as the number of nodes in the following way:

# In bash
$ export I_MPI_HYDRA_BRANCH_COUNT=$(($PBS_NCPUS / $PBS_NCI_NCPUS_PER_NODE))
 
# In csh/tcsh
$ @ intel_num_nodes = ( $PBS_NCPUS / $PBS_NCI_NCPUS_PER_NODE )
$ setenv I_MPI_HYDRA_BRANCH_COUNT $intel_num_nodes

Uniform MPI: Distributing 24 MPI processes across one Gadi Cascade Lake node


As mentioned before, a Cascade Lake node has 48 physical cores. This example uses hardware threads (hwt 0) and does not use hyperthreads.

24 MPI processes across 48 CPUs 

By default, Open MPI will place each of the tasks sequentially on the cores available:

$ mpirun -np 24 --report-bindings ./my_program

[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 18 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 19 bound to socket 0[core 19[hwt 0]]: [./././././././././././././././././././B/./././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 20 bound to socket 0[core 20[hwt 0]]: [././././././././././././././././././././B/././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 21 bound to socket 0[core 21[hwt 0]]: [./././././././././././././././././././././B/./.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 22 bound to socket 0[core 22[hwt 0]]: [././././././././././././././././././././././B/.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1187.gadi.nci.org.au:645683] MCW rank 23 bound to socket 0[core 23[hwt 0]]: [./././././././././././././././././././././././B][./././././././././././././././././././././././.]

The corresponding placement of the processes is given in the table below:

Gadi Cascade Lake Node (hwt 0)

Socket 0

Socket 1

Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
proc0proc1proc2proc3proc4proc5proc6proc7proc8proc9proc10proc11proc12proc13proc14proc15proc16proc17proc18proc19proc20proc21proc22proc23























Intel MPI behaves differently, and will attempt to separate MPI processes as much as possible. When 24 tasks are launched on a 48 core node, each task will be bound to two cores:

# In bash
$ export I_MPI_DEBUG=5
$ mpirun -np 24 ./my_program
 
# In csh/tcsh
$ setenv I_MPI_DEBUG 5
$ mpirun -np 24 ./my_program

[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 572196 gadi-cpu-clx-2372.gadi.nci.org.au {0,1}
[0] MPI startup(): 1 572197 gadi-cpu-clx-2372.gadi.nci.org.au {2,3}
[0] MPI startup(): 2 572198 gadi-cpu-clx-2372.gadi.nci.org.au {7,8}
[0] MPI startup(): 3 572199 gadi-cpu-clx-2372.gadi.nci.org.au {9,13}
[0] MPI startup(): 4 572200 gadi-cpu-clx-2372.gadi.nci.org.au {14,15}
[0] MPI startup(): 5 572201 gadi-cpu-clx-2372.gadi.nci.org.au {19,20}
[0] MPI startup(): 6 572202 gadi-cpu-clx-2372.gadi.nci.org.au {4,5}
[0] MPI startup(): 7 572203 gadi-cpu-clx-2372.gadi.nci.org.au {6,10}
[0] MPI startup(): 8 572204 gadi-cpu-clx-2372.gadi.nci.org.au {11,12}
[0] MPI startup(): 9 572205 gadi-cpu-clx-2372.gadi.nci.org.au {16,17}
[0] MPI startup(): 10 572206 gadi-cpu-clx-2372.gadi.nci.org.au {18,21}
[0] MPI startup(): 11 572207 gadi-cpu-clx-2372.gadi.nci.org.au {22,23}
[0] MPI startup(): 12 572208 gadi-cpu-clx-2372.gadi.nci.org.au {24,25}
[0] MPI startup(): 13 572209 gadi-cpu-clx-2372.gadi.nci.org.au {26,27}
[0] MPI startup(): 14 572210 gadi-cpu-clx-2372.gadi.nci.org.au {31,32}
[0] MPI startup(): 15 572211 gadi-cpu-clx-2372.gadi.nci.org.au {36,37}
[0] MPI startup(): 16 572212 gadi-cpu-clx-2372.gadi.nci.org.au {38,42}
[0] MPI startup(): 17 572213 gadi-cpu-clx-2372.gadi.nci.org.au {43,44}
[0] MPI startup(): 18 572214 gadi-cpu-clx-2372.gadi.nci.org.au {28,29}
[0] MPI startup(): 19 572215 gadi-cpu-clx-2372.gadi.nci.org.au {30,33}
[0] MPI startup(): 20 572216 gadi-cpu-clx-2372.gadi.nci.org.au {34,35}
[0] MPI startup(): 21 572217 gadi-cpu-clx-2372.gadi.nci.org.au {39,40}
[0] MPI startup(): 22 572218 gadi-cpu-clx-2372.gadi.nci.org.au {41,45}
[0] MPI startup(): 23 572219 gadi-cpu-clx-2372.gadi.nci.org.au {46,47}

The corresponding placement of the processes is given in the table below:

Gadi Cascade Lake Node (hwt 0)

Socket 0

Socket 1

Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
proc0proc0proc1proc1proc2proc2proc3proc3proc4proc4proc5proc5proc6proc6proc7proc7proc8proc8proc9proc9proc10proc10proc11proc11proc12proc12proc13proc13proc14proc14proc15proc15proc16proc16proc17proc17proc18proc18proc19proc19proc20proc20proc21proc21proc22proc22proc23proc23

Note that this does not necessarily mean the application can effectively utilise multiple cores, but it will have the resources of 2 cores available to it. Intel MPI prefers processor binding options to be set using environment variables (in fact, if any I_MPI_PIN_* variables are set, the -binding command line option is ignored). The following binds all 24 MPI tasks to the first CPU on the node (i.e emulates Open MPI's default behaviour):

# In bash
$ export I_MPI_DEBUG=5
$ export I_MPI_PIN_MAP=core
$ mpirun -np 24  ./my_program
 
# In csh/tcsh
$ setenv I_MPI_DEBUG 5
$ setenv I_MPI_PIN_MAP core
$ mpirun -np 24 ./my_program

[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 572413 gadi-cpu-clx-2372.gadi.nci.org.au 0
[0] MPI startup(): 1 572414 gadi-cpu-clx-2372.gadi.nci.org.au 1
[0] MPI startup(): 2 572415 gadi-cpu-clx-2372.gadi.nci.org.au 2
[0] MPI startup(): 3 572416 gadi-cpu-clx-2372.gadi.nci.org.au 3
[0] MPI startup(): 4 572417 gadi-cpu-clx-2372.gadi.nci.org.au 4
[0] MPI startup(): 5 572418 gadi-cpu-clx-2372.gadi.nci.org.au 5
[0] MPI startup(): 6 572419 gadi-cpu-clx-2372.gadi.nci.org.au 6
[0] MPI startup(): 7 572420 gadi-cpu-clx-2372.gadi.nci.org.au 7
[0] MPI startup(): 8 572421 gadi-cpu-clx-2372.gadi.nci.org.au 8
[0] MPI startup(): 9 572422 gadi-cpu-clx-2372.gadi.nci.org.au 9
[0] MPI startup(): 10 572423 gadi-cpu-clx-2372.gadi.nci.org.au 10
[0] MPI startup(): 11 572424 gadi-cpu-clx-2372.gadi.nci.org.au 11
[0] MPI startup(): 12 572425 gadi-cpu-clx-2372.gadi.nci.org.au 12
[0] MPI startup(): 13 572426 gadi-cpu-clx-2372.gadi.nci.org.au 13
[0] MPI startup(): 14 572427 gadi-cpu-clx-2372.gadi.nci.org.au 14
[0] MPI startup(): 15 572428 gadi-cpu-clx-2372.gadi.nci.org.au 15
[0] MPI startup(): 16 572429 gadi-cpu-clx-2372.gadi.nci.org.au 16
[0] MPI startup(): 17 572430 gadi-cpu-clx-2372.gadi.nci.org.au 17
[0] MPI startup(): 18 572431 gadi-cpu-clx-2372.gadi.nci.org.au 18
[0] MPI startup(): 19 572432 gadi-cpu-clx-2372.gadi.nci.org.au 19
[0] MPI startup(): 20 572433 gadi-cpu-clx-2372.gadi.nci.org.au 20
[0] MPI startup(): 21 572434 gadi-cpu-clx-2372.gadi.nci.org.au 21
[0] MPI startup(): 22 572435 gadi-cpu-clx-2372.gadi.nci.org.au 22
[0] MPI startup(): 23 572436 gadi-cpu-clx-2372.gadi.nci.org.au 23

The corresponding placement of the processes is given in the table below:

Gadi Cascade Lake Node (hwt 0)

Socket 0

Socket 1

Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
proc0proc1proc2proc3proc4proc5proc6proc7proc8proc9proc10proc11proc12proc13proc14proc15proc16proc17proc18proc19proc20proc21proc22proc23























If you wish to run multiple MPI applications simultaneously in the same PBS job, the process bindings will need to be set such that each MPI application will run on separate cores. The following example places 24 tasks entirely on the second CPU of a node using Open MPI:

# --cpu-set is currently working only on openmpi version 2.1.6.
# While tested, it does not work on the available versions 3.0.4, 3.1.4,
# 4.0.1, 4.0.2, 4.0.3, 4.0.4 and 4.0.5. The reported error is the following:
#
# Conflicting directives for mapping policy are causing the policy
# to be redefined:
#
#   New policy:   RANK_FILE
#   Prior policy:  BYCORE
#
# Please check that only one policy is defined.
 
$ mpirun -np 24 --cpu-set 24-47 --report-bindings ./my_program  # On openmpi/2.1.6

[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 0 bound to socket 1[core 24[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][B./../../../../../../../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 1 bound to socket 1[core 25[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../B./../../../../../../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 2 bound to socket 1[core 26[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../B./../../../../../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 3 bound to socket 1[core 27[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../B./../../../../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 4 bound to socket 1[core 28[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../B./../../../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 5 bound to socket 1[core 29[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../B./../../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 6 bound to socket 1[core 30[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../B./../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 7 bound to socket 1[core 31[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../B./../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 8 bound to socket 1[core 32[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../B./../../../../../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 9 bound to socket 1[core 33[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../B./../../../../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 10 bound to socket 1[core 34[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../B./../../../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 11 bound to socket 1[core 35[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../B./../../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 12 bound to socket 1[core 36[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../B./../../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 13 bound to socket 1[core 37[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../B./../../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 14 bound to socket 1[core 38[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../B./../../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 15 bound to socket 1[core 39[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../B./../../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 16 bound to socket 1[core 40[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../B./../../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 17 bound to socket 1[core 41[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../B./../../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 18 bound to socket 1[core 42[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../B./../../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 19 bound to socket 1[core 43[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../B./../../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 20 bound to socket 1[core 44[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../../B./../../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 21 bound to socket 1[core 45[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../../../B./../..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 22 bound to socket 1[core 46[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../../../../B./..]
[gadi-cpu-clx-2124.gadi.nci.org.au:2822122] MCW rank 23 bound to socket 1[core 47[hwt 0]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../../../../../B.]

And using Intel MPI:

# In bash
$ export I_MPI_DEBUG=5
$ export I_MPI_PIN_MAP=24-47
$ mpirun -np 24 ./my_program
 
# In csh/tcsh
$ setenv I_MPI_DEBUG 5
$ setenv I_MPI_PIN_MAP 24-47
$ mpirun -np 24 ./my_program

[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 2822387 gadi-cpu-clx-2124.gadi.nci.org.au 24
[0] MPI startup(): 1 2822388 gadi-cpu-clx-2124.gadi.nci.org.au 25
[0] MPI startup(): 2 2822389 gadi-cpu-clx-2124.gadi.nci.org.au 26
[0] MPI startup(): 3 2822390 gadi-cpu-clx-2124.gadi.nci.org.au 27
[0] MPI startup(): 4 2822391 gadi-cpu-clx-2124.gadi.nci.org.au 28
[0] MPI startup(): 5 2822392 gadi-cpu-clx-2124.gadi.nci.org.au 29
[0] MPI startup(): 6 2822393 gadi-cpu-clx-2124.gadi.nci.org.au 30
[0] MPI startup(): 7 2822394 gadi-cpu-clx-2124.gadi.nci.org.au 31
[0] MPI startup(): 8 2822395 gadi-cpu-clx-2124.gadi.nci.org.au 32
[0] MPI startup(): 9 2822396 gadi-cpu-clx-2124.gadi.nci.org.au 33
[0] MPI startup(): 10 2822397 gadi-cpu-clx-2124.gadi.nci.org.au 34
[0] MPI startup(): 11 2822398 gadi-cpu-clx-2124.gadi.nci.org.au 35
[0] MPI startup(): 12 2822399 gadi-cpu-clx-2124.gadi.nci.org.au 36
[0] MPI startup(): 13 2822400 gadi-cpu-clx-2124.gadi.nci.org.au 37
[0] MPI startup(): 14 2822401 gadi-cpu-clx-2124.gadi.nci.org.au 38
[0] MPI startup(): 15 2822402 gadi-cpu-clx-2124.gadi.nci.org.au 39
[0] MPI startup(): 16 2822403 gadi-cpu-clx-2124.gadi.nci.org.au 40
[0] MPI startup(): 17 2822404 gadi-cpu-clx-2124.gadi.nci.org.au 41
[0] MPI startup(): 18 2822405 gadi-cpu-clx-2124.gadi.nci.org.au 42
[0] MPI startup(): 19 2822406 gadi-cpu-clx-2124.gadi.nci.org.au 43
[0] MPI startup(): 20 2822407 gadi-cpu-clx-2124.gadi.nci.org.au 44
[0] MPI startup(): 21 2822408 gadi-cpu-clx-2124.gadi.nci.org.au 45
[0] MPI startup(): 22 2822409 gadi-cpu-clx-2124.gadi.nci.org.au 46
[0] MPI startup(): 23 2822410 gadi-cpu-clx-2124.gadi.nci.org.au 47

The corresponding placement of the processes is given in the table below:

Gadi Cascade Lake Node (hwt 0)

Socket 0

Socket 1

Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
























proc0proc1proc2proc3proc4proc5proc6proc7proc8proc9proc10proc11proc12proc13proc14proc15proc16proc17proc18proc19proc20proc21proc22proc23

24 MPI processes across 48 CPUs: twelve processes per socket 

In this example we distribute half the processes to another socket in the same node using the --map-by option in Open MPI. We specify a process-per-resource value of 12 per socket:

$ mpirun -np 24 --map-by ppr:12:socket --report-bindings ./my_program
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 12 bound to socket 1[core 24[hwt 0]]: [./././././././././././././././././././././././.][B/././././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 13 bound to socket 1[core 25[hwt 0]]: [./././././././././././././././././././././././.][./B/./././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 14 bound to socket 1[core 26[hwt 0]]: [./././././././././././././././././././././././.][././B/././././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 15 bound to socket 1[core 27[hwt 0]]: [./././././././././././././././././././././././.][./././B/./././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 16 bound to socket 1[core 28[hwt 0]]: [./././././././././././././././././././././././.][././././B/././././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 17 bound to socket 1[core 29[hwt 0]]: [./././././././././././././././././././././././.][./././././B/./././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 18 bound to socket 1[core 30[hwt 0]]: [./././././././././././././././././././././././.][././././././B/././././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 19 bound to socket 1[core 31[hwt 0]]: [./././././././././././././././././././././././.][./././././././B/./././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 20 bound to socket 1[core 32[hwt 0]]: [./././././././././././././././././././././././.][././././././././B/././././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 21 bound to socket 1[core 33[hwt 0]]: [./././././././././././././././././././././././.][./././././././././B/./././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 22 bound to socket 1[core 34[hwt 0]]: [./././././././././././././././././././././././.][././././././././././B/././././././././././././.]
[gadi-cpu-clx-1188.gadi.nci.org.au:196355] MCW rank 23 bound to socket 1[core 35[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././B/./././././././././././.]

And with the I_MPI_PIN_PROCESSOR_LIST option in Intel MPI:

# In bash
$ export I_MPI_DEBUG=5
$ export I_MPI_PIN_PROCESSOR_LIST=0-11,24-35
$ mpirun -np 24 ./my_program
 
# In csh/tcsh
$ setenv I_MPI_DEBUG 5
$ setenv I_MPI_PIN_PROCESSOR_LIST 0-11,24-35
$ mpirun -np 24 ./my_program

[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 376581 gadi-cpu-clx-0790.gadi.nci.org.au 0
[0] MPI startup(): 1 376582 gadi-cpu-clx-0790.gadi.nci.org.au 1
[0] MPI startup(): 2 376583 gadi-cpu-clx-0790.gadi.nci.org.au 2
[0] MPI startup(): 3 376584 gadi-cpu-clx-0790.gadi.nci.org.au 3
[0] MPI startup(): 4 376585 gadi-cpu-clx-0790.gadi.nci.org.au 4
[0] MPI startup(): 5 376586 gadi-cpu-clx-0790.gadi.nci.org.au 5
[0] MPI startup(): 6 376587 gadi-cpu-clx-0790.gadi.nci.org.au 6
[0] MPI startup(): 7 376588 gadi-cpu-clx-0790.gadi.nci.org.au 7
[0] MPI startup(): 8 376589 gadi-cpu-clx-0790.gadi.nci.org.au 8
[0] MPI startup(): 9 376590 gadi-cpu-clx-0790.gadi.nci.org.au 9
[0] MPI startup(): 10 376591 gadi-cpu-clx-0790.gadi.nci.org.au 10
[0] MPI startup(): 11 376592 gadi-cpu-clx-0790.gadi.nci.org.au 11
[0] MPI startup(): 12 376593 gadi-cpu-clx-0790.gadi.nci.org.au 24
[0] MPI startup(): 13 376594 gadi-cpu-clx-0790.gadi.nci.org.au 25
[0] MPI startup(): 14 376595 gadi-cpu-clx-0790.gadi.nci.org.au 26
[0] MPI startup(): 15 376596 gadi-cpu-clx-0790.gadi.nci.org.au 27
[0] MPI startup(): 16 376597 gadi-cpu-clx-0790.gadi.nci.org.au 28
[0] MPI startup(): 17 376598 gadi-cpu-clx-0790.gadi.nci.org.au 29
[0] MPI startup(): 18 376599 gadi-cpu-clx-0790.gadi.nci.org.au 30
[0] MPI startup(): 19 376600 gadi-cpu-clx-0790.gadi.nci.org.au 31
[0] MPI startup(): 20 376601 gadi-cpu-clx-0790.gadi.nci.org.au 32
[0] MPI startup(): 21 376602 gadi-cpu-clx-0790.gadi.nci.org.au 33
[0] MPI startup(): 22 376603 gadi-cpu-clx-0790.gadi.nci.org.au 34
[0] MPI startup(): 23 376604 gadi-cpu-clx-0790.gadi.nci.org.au 35

The corresponding placement of the processes is given in the table below:

Gadi Cascade Lake Node (hwt 0)

Socket 0

Socket 1

Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
proc0proc1proc2proc3proc4proc5proc6proc7proc8proc9proc10proc11











proc12proc13proc14proc15proc16proc17proc18proc19proc20proc21proc22proc23











Uniform MPI: Distributing 24 MPI processes across two Gadi Cascade Lake nodes 


24 MPI processes across 96 CPUs: six tasks per socket 

Across two nodes, 48 physical CPU cores are available. A PBS resource request of ncpus=96 on the normal queue would provision two such nodes to a job.

In this example we distribute six MPI processes to each socket of the two nodes. Again, we make use of the --map-by option in Open MPI:

$ mpirun -np 24 --map-by ppr:6:socket --report-bindings ./my_program

[gadi-cpu-clx-0436.gadi.nci.org.au:188782] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-0436.gadi.nci.org.au:188782] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-0436.gadi.nci.org.au:188782] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-0436.gadi.nci.org.au:188782] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-0436.gadi.nci.org.au:188782] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-0436.gadi.nci.org.au:188782] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-0436.gadi.nci.org.au:188782] MCW rank 6 bound to socket 1[core 24[hwt 0]]: [./././././././././././././././././././././././.][B/././././././././././././././././././././././.]
[gadi-cpu-clx-0436.gadi.nci.org.au:188782] MCW rank 7 bound to socket 1[core 25[hwt 0]]: [./././././././././././././././././././././././.][./B/./././././././././././././././././././././.]
[gadi-cpu-clx-0436.gadi.nci.org.au:188782] MCW rank 8 bound to socket 1[core 26[hwt 0]]: [./././././././././././././././././././././././.][././B/././././././././././././././././././././.]
[gadi-cpu-clx-0436.gadi.nci.org.au:188782] MCW rank 9 bound to socket 1[core 27[hwt 0]]: [./././././././././././././././././././././././.][./././B/./././././././././././././././././././.]
[gadi-cpu-clx-0436.gadi.nci.org.au:188782] MCW rank 10 bound to socket 1[core 28[hwt 0]]: [./././././././././././././././././././././././.][././././B/././././././././././././././././././.]
[gadi-cpu-clx-0436.gadi.nci.org.au:188782] MCW rank 11 bound to socket 1[core 29[hwt 0]]: [./././././././././././././././././././././././.][./././././B/./././././././././././././././././.]
[gadi-cpu-clx-0437.gadi.nci.org.au:194999] MCW rank 12 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-0437.gadi.nci.org.au:194999] MCW rank 13 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-0437.gadi.nci.org.au:194999] MCW rank 14 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-0437.gadi.nci.org.au:194999] MCW rank 15 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-0437.gadi.nci.org.au:194999] MCW rank 16 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-0437.gadi.nci.org.au:194999] MCW rank 17 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-0437.gadi.nci.org.au:194999] MCW rank 18 bound to socket 1[core 24[hwt 0]]: [./././././././././././././././././././././././.][B/././././././././././././././././././././././.]
[gadi-cpu-clx-0437.gadi.nci.org.au:194999] MCW rank 19 bound to socket 1[core 25[hwt 0]]: [./././././././././././././././././././././././.][./B/./././././././././././././././././././././.]
[gadi-cpu-clx-0437.gadi.nci.org.au:194999] MCW rank 20 bound to socket 1[core 26[hwt 0]]: [./././././././././././././././././././././././.][././B/././././././././././././././././././././.]
[gadi-cpu-clx-0437.gadi.nci.org.au:194999] MCW rank 21 bound to socket 1[core 27[hwt 0]]: [./././././././././././././././././././././././.][./././B/./././././././././././././././././././.]
[gadi-cpu-clx-0437.gadi.nci.org.au:194999] MCW rank 22 bound to socket 1[core 28[hwt 0]]: [./././././././././././././././././././././././.][././././B/././././././././././././././././././.]
[gadi-cpu-clx-0437.gadi.nci.org.au:194999] MCW rank 23 bound to socket 1[core 29[hwt 0]]: [./././././././././././././././././././././././.][./././././B/./././././././././././././././././.]

When using Intel MPI, a hostfile must be constructed to allow correct binding across multiple nodes. The following line constructs the hostfile called "hosts.txt", which can then be passed to mpirun using the -f option:

 $ uniq < $PBS_NODEFILE > hosts.txt

And using the hostfile, I_MPI_PIN_PROCESSOR_LIST and the -ppn (processes per node) option in Intel MPI:

# In bash
$ export I_MPI_DEBUG=5
$ export I_MPI_PIN_PROCESSOR_LIST=0-5,24-29
$ uniq < $PBS_NODEFILE > hosts.txt
$ mpirun -np 24 -ppn 12 -f hosts.txt ./my_program
 
# In csh/tcsh
$ setenv I_MPI_DEBUG 5
$ setenv I_MPI_PIN_PROCESSOR_LIST 0-5,24-29
$ uniq < $PBS_NODEFILE > hosts.txt
$ mpirun -np 24 -ppn 12 -f hosts.txt ./my_program

[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 260625 gadi-cpu-clx-0278.gadi.nci.org.au 0
[0] MPI startup(): 1 260626 gadi-cpu-clx-0278.gadi.nci.org.au 1
[0] MPI startup(): 2 260627 gadi-cpu-clx-0278.gadi.nci.org.au 2
[0] MPI startup(): 3 260628 gadi-cpu-clx-0278.gadi.nci.org.au 3
[0] MPI startup(): 4 260629 gadi-cpu-clx-0278.gadi.nci.org.au 4
[0] MPI startup(): 5 260630 gadi-cpu-clx-0278.gadi.nci.org.au 5
[0] MPI startup(): 6 260631 gadi-cpu-clx-0278.gadi.nci.org.au 24
[0] MPI startup(): 7 260632 gadi-cpu-clx-0278.gadi.nci.org.au 25
[0] MPI startup(): 8 260633 gadi-cpu-clx-0278.gadi.nci.org.au 26
[0] MPI startup(): 9 260634 gadi-cpu-clx-0278.gadi.nci.org.au 27
[0] MPI startup(): 10 260635 gadi-cpu-clx-0278.gadi.nci.org.au 28
[0] MPI startup(): 11 260636 gadi-cpu-clx-0278.gadi.nci.org.au 29
[0] MPI startup(): 12 346329 gadi-cpu-clx-0280.gadi.nci.org.au 0
[0] MPI startup(): 13 346330 gadi-cpu-clx-0280.gadi.nci.org.au 1
[0] MPI startup(): 14 346331 gadi-cpu-clx-0280.gadi.nci.org.au 2
[0] MPI startup(): 15 346332 gadi-cpu-clx-0280.gadi.nci.org.au 3
[0] MPI startup(): 16 346333 gadi-cpu-clx-0280.gadi.nci.org.au 4
[0] MPI startup(): 17 346334 gadi-cpu-clx-0280.gadi.nci.org.au 5
[0] MPI startup(): 18 346335 gadi-cpu-clx-0280.gadi.nci.org.au 24
[0] MPI startup(): 19 346336 gadi-cpu-clx-0280.gadi.nci.org.au 25
[0] MPI startup(): 20 346337 gadi-cpu-clx-0280.gadi.nci.org.au 26
[0] MPI startup(): 21 346338 gadi-cpu-clx-0280.gadi.nci.org.au 27
[0] MPI startup(): 22 346339 gadi-cpu-clx-0280.gadi.nci.org.au 28
[0] MPI startup(): 23 346340 gadi-cpu-clx-0280.gadi.nci.org.au 29

The corresponding placement of the processes is given in the table below:

Gadi Cascade Lake Node 0 (hwt 0)

Socket 0

Socket 1

Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
proc0proc1proc2proc3proc4proc5

















proc6proc7proc8proc9proc10proc11

















Gadi Cascade Lake Node 1 (hwt 0)
Socket 0Socket 1
Numa 0Numa 1Numa 2Numa 3
proc12proc13proc14proc15proc16proc17

















proc18proc19proc20proc21proc22proc23

















 

24 MPI processes across 96 CPUs: three tasks per numa node 

In this example we distribute three MPI processes to each numa node of the two nodes. Again, we make use of the --map-by option in Open MPI:

 $ mpirun -np 24 --map-by ppr:3:numa --report-bindings ./my_program

[gadi-cpu-clx-2128.gadi.nci.org.au:409212] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2128.gadi.nci.org.au:409212] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2128.gadi.nci.org.au:409212] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2128.gadi.nci.org.au:409212] MCW rank 3 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2128.gadi.nci.org.au:409212] MCW rank 4 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2128.gadi.nci.org.au:409212] MCW rank 5 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2128.gadi.nci.org.au:409212] MCW rank 6 bound to socket 1[core 24[hwt 0]]: [./././././././././././././././././././././././.][B/././././././././././././././././././././././.]
[gadi-cpu-clx-2128.gadi.nci.org.au:409212] MCW rank 7 bound to socket 1[core 25[hwt 0]]: [./././././././././././././././././././././././.][./B/./././././././././././././././././././././.]
[gadi-cpu-clx-2128.gadi.nci.org.au:409212] MCW rank 8 bound to socket 1[core 26[hwt 0]]: [./././././././././././././././././././././././.][././B/././././././././././././././././././././.]
[gadi-cpu-clx-2128.gadi.nci.org.au:409212] MCW rank 9 bound to socket 1[core 36[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././B/././././././././././.]
[gadi-cpu-clx-2128.gadi.nci.org.au:409212] MCW rank 10 bound to socket 1[core 37[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././B/./././././././././.]
[gadi-cpu-clx-2128.gadi.nci.org.au:409212] MCW rank 11 bound to socket 1[core 38[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././B/././././././././.]
[gadi-cpu-clx-2129.gadi.nci.org.au:2455084] MCW rank 12 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2129.gadi.nci.org.au:2455084] MCW rank 13 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2129.gadi.nci.org.au:2455084] MCW rank 14 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2129.gadi.nci.org.au:2455084] MCW rank 15 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2129.gadi.nci.org.au:2455084] MCW rank 16 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2129.gadi.nci.org.au:2455084] MCW rank 17 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2129.gadi.nci.org.au:2455084] MCW rank 18 bound to socket 1[core 24[hwt 0]]: [./././././././././././././././././././././././.][B/././././././././././././././././././././././.]
[gadi-cpu-clx-2129.gadi.nci.org.au:2455084] MCW rank 19 bound to socket 1[core 25[hwt 0]]: [./././././././././././././././././././././././.][./B/./././././././././././././././././././././.]
[gadi-cpu-clx-2129.gadi.nci.org.au:2455084] MCW rank 20 bound to socket 1[core 26[hwt 0]]: [./././././././././././././././././././././././.][././B/././././././././././././././././././././.]
[gadi-cpu-clx-2129.gadi.nci.org.au:2455084] MCW rank 21 bound to socket 1[core 36[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././B/././././././././././.]
[gadi-cpu-clx-2129.gadi.nci.org.au:2455084] MCW rank 22 bound to socket 1[core 37[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././B/./././././././././.]
[gadi-cpu-clx-2129.gadi.nci.org.au:2455084] MCW rank 23 bound to socket 1[core 38[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././B/././././././././.]

And using the hostfile, I_MPI_PIN_PROCESSOR_LIST and the -ppn (processes per node) option in Intel MPI:

# In bash
$ export I_MPI_DEBUG=5
$ export I_MPI_PIN_PROCESSOR_LIST=0-2,12-14,24-26,36-38
$ uniq < $PBS_NODEFILE > hosts.txt
$ mpirun -np 24 -ppn 12 -f hosts.txt ./my_program
 
# In csh/tcsh
$ setenv I_MPI_DEBUG 5
$ setenv I_MPI_PIN_PROCESSOR_LIST 0-2,12-14,24-26,36-38
$ uniq < $PBS_NODEFILE > hosts.txt
$ mpirun -np 24 -ppn 12 -f hosts.txt ./my_program

[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 257154 gadi-cpu-clx-1198.gadi.nci.org.au 0
[0] MPI startup(): 1 257155 gadi-cpu-clx-1198.gadi.nci.org.au 1
[0] MPI startup(): 2 257156 gadi-cpu-clx-1198.gadi.nci.org.au 2
[0] MPI startup(): 3 257157 gadi-cpu-clx-1198.gadi.nci.org.au 12
[0] MPI startup(): 4 257158 gadi-cpu-clx-1198.gadi.nci.org.au 13
[0] MPI startup(): 5 257159 gadi-cpu-clx-1198.gadi.nci.org.au 14
[0] MPI startup(): 6 257160 gadi-cpu-clx-1198.gadi.nci.org.au 24
[0] MPI startup(): 7 257161 gadi-cpu-clx-1198.gadi.nci.org.au 25
[0] MPI startup(): 8 257162 gadi-cpu-clx-1198.gadi.nci.org.au 26
[0] MPI startup(): 9 257163 gadi-cpu-clx-1198.gadi.nci.org.au 36
[0] MPI startup(): 10 257164 gadi-cpu-clx-1198.gadi.nci.org.au 37
[0] MPI startup(): 11 257165 gadi-cpu-clx-1198.gadi.nci.org.au 38
[0] MPI startup(): 12 255294 gadi-cpu-clx-1202.gadi.nci.org.au 0
[0] MPI startup(): 13 255295 gadi-cpu-clx-1202.gadi.nci.org.au 1
[0] MPI startup(): 14 255296 gadi-cpu-clx-1202.gadi.nci.org.au 2
[0] MPI startup(): 15 255297 gadi-cpu-clx-1202.gadi.nci.org.au 12
[0] MPI startup(): 16 255298 gadi-cpu-clx-1202.gadi.nci.org.au 13
[0] MPI startup(): 17 255299 gadi-cpu-clx-1202.gadi.nci.org.au 14
[0] MPI startup(): 18 255300 gadi-cpu-clx-1202.gadi.nci.org.au 24
[0] MPI startup(): 19 255301 gadi-cpu-clx-1202.gadi.nci.org.au 25
[0] MPI startup(): 20 255302 gadi-cpu-clx-1202.gadi.nci.org.au 26
[0] MPI startup(): 21 255303 gadi-cpu-clx-1202.gadi.nci.org.au 36
[0] MPI startup(): 22 255304 gadi-cpu-clx-1202.gadi.nci.org.au 37
[0] MPI startup(): 23 255305 gadi-cpu-clx-1202.gadi.nci.org.au 38


The corresponding placement of the processes is given in the table below:

Gadi Cascade Lake Node 0 (hwt 0)

Socket 0

Socket 1

Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
proc0proc1proc2








proc3proc4proc5








proc6proc7proc8








proc9proc10proc11








Gadi Cascade Lake Node 1 (hwt 0)
Socket 0Socket 1
Numa 0Numa 1Numa 2Numa 3
proc12proc13proc14








proc15proc16proc17








proc18proc19proc20








proc21proc22proc23








Hybrid MPI/OpenMP: Distributing processes and threads across two Gadi Cascade Lake nodes


If an application is written to take advantage of shared memory parallel programming with OpenMP, it can be used in conjunction with MPI. This example also uses hardware threads. 

Programs written with OpenMP may run more efficiently across a single node compared to MPI programs. Therefore, when using Hybrid MPI/OpenMP, computation should be distributed such that there are at most one MPI process per node with n OpenMP threads per process, where n is the number of cores available on the node.

2 MPI processes, 96 OpenMP threads across 96 CPUs: 1 process per node, 48 threads per process 

In this example we create a single MPI process on each node. Within each such process, 48 OpenMP threads are launched, which corresponds to 1 thread per CPU core. The use of the --map-by option is extended to include the notion of a pe or processing element on a node. One process per node, 48 processing elements per process. Note that we can use environment variables that will work with any value of OpenMP threads that is a factor of the number of cores on a node.

 # In bash
$ export OMP_NUM_THREADS=48
$ export GOMP_CPU_AFFINITY=0-47
$ mpirun -np $(( $PBS_NCPUS / $OMP_NUM_THREADS )) --map-by node:PE=$OMP_NUM_THREADS --rank-by core --report-bindings ./my_program
 
# In csh/tcsh
$ setenv OMP_NUM_THREADS 48
$ setenv GOMP_CPU_AFFINITY 0-47
$ @ nprocs = ( $PBS_NCPUS / $OMP_NUM_THREADS )
$ mpirun -np $nprocs --map-by node:$OMP_NUM_THREADS --rank-by core --report-bindings ./my_program

When using Intel MPI, it is sufficient to use only command line options (i.e. no I_MPI_PIN_* environment variables need to be set) as binding each MPI process to as many cores as possible is Intel MPI's default behaviour. The use of the PBS_NCPUS variable allows this script to be run for any ncpus request without modification. Note that though OMP_NUM_THREADS can be set to an arbitrary value less than or equal to the number of cores on a node, setting it to a factor of the number of cores on a node will provide optimal resource usage.

# In bash
$ export I_MPI_DEBUG=5
$ export OMP_NUM_THREADS=48
$ export KMP_AFFINITY="explicit,proclist=[0-47]"
$ export PPN=$( grep -c $HOSTNAME $PBS_NODEFILE )
$ uniq < $PBS_NODEFILE > hosts.txt
$ mpirun -np $(( $PBS_NCPUS / $OMP_NUM_THREADS )) -ppn $(( $PPN / $OMP_NUM_THREADS )) -f hosts.txt ./my_program
 
# In csh/tcsh
$ setenv I_MPI_DEBUG 5
$ setenv OMP_NUM_THREADS 48
$ setenv KMP_AFFINITY "explicit,proclist=[0-47]"
$ setenv PPN=`grep -c $HOSTNAME $PBS_NODEFILE`
$ @ nprocs = ( $PBS_NCPUS / $OMP_NUM_THREADS )
$ @ ppn = (  $PPN / $OMP_NUM_THREADS )
$ uniq < $PBS_NODEFILE > hosts.txt
$ mpirun -np $nprocs -ppn $ppn -f hosts.txt ./my_program


[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 260901 gadi-cpu-clx-0278.gadi.nci.org.au {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47}
[0] MPI startup(): 1 346508 gadi-cpu-clx-0280.gadi.nci.org.au {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47}

Note the PPN environment variable, which is set by using grep to count the number of occurrences of the a node's name in the PBS node file, effectively counting the number of cores on each node. This allows scripts with complex binding options to be run unchanged in any of the job queues available at NCI. 

Without controlling affinity, threads may be placed on the same core, e.g. thread0 and thread1 could be running on hwt0 and hwt1 respectively, both on core0. For more information on OpenMP thread affinity, please see https://software.intel.com/en-us/articles/openmp-thread-affinity-control.

The corresponding placement of processes and threads is given in the table below:

Gadi Cascade Lake Node 0 (hwt 0)

Socket 0

Socket 1

Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
proc0
thread0thread1thread2thread3thread4thread5thread6thread7thread8thread9thread10thread11thread12thread13thread14thread15thread16thread17thread18thread19thread20thread21thread22thread23thread24thread25thread26thread27thread28thread29thread30thread31thread32thread33thread34thread35thread36thread37thread38thread39thread40thread41thread42thread43thread44thread45thread46thread47
Gadi Cascade Lake Node 1 (hwt 0)
Socket 0Socket 1
Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
proc1
thread0thread1thread2thread3thread4thread5thread6thread7thread8thread9thread10thread11thread12thread13thread14thread15thread16thread17thread18thread19thread20thread21thread22thread23thread24thread25thread26thread27thread28thread29thread30thread31thread32thread33thread34thread35thread36thread37thread38thread39thread40thread41thread42thread43thread44thread45thread46thread47

4 MPI processes, 96 OpenMP threads across 96 CPUs: 1 process per socket, 24 threads per process 

Similar to the previous example, 96 OpenMP threads are created across two nodes. However, in this example, there are 4 MPI processes with 24 threads per process. Note that the mpirun command is identical to the previous example for both Open MPI and Intel MPI.

 # In bash
$ export OMP_NUM_THREADS=24
$ export GOMP_CPU_AFFINITY=0-47
$ mpirun -np $(( $PBS_NCPUS / $OMP_NUM_THREADS )) --map-by node:PE=$OMP_NUM_THREADS --rank-by core --report-bindings ./my_program
 
# In csh/tcsh
$ setenv OMP_NUM_THREADS 24
$ setenv GOMP_CPU_AFFINITY 0-47
$ @ nprocs = ( $PBS_NCPUS / $OMP_NUM_THREADS )
$ mpirun -np $nprocs --map-by node:$OMP_NUM_THREADS --rank-by core --report-bindings ./my_program

[gadi-cpu-clx-2671.gadi.nci.org.au:227672] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]], socket 0[core 12[hwt 0]], socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt 0]], socket 0[core 16[hwt 0]], socket 0[core 17[hwt 0]], socket 0[core 18[hwt 0]], socket 0[core 19[hwt 0]], socket 0[core 20[hwt 0]], socket 0[core 21[hwt 0]], socket 0[core 22[hwt 0]], socket 0[core 23[hwt 0]]: [B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././././././././././././././.]
[gadi-cpu-clx-2671.gadi.nci.org.au:227672] MCW rank 1 bound to socket 1[core 24[hwt 0]], socket 1[core 25[hwt 0]], socket 1[core 26[hwt 0]], socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 0]], socket 1[core 30[hwt 0]], socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 0]], socket 1[core 34[hwt 0]], socket 1[core 35[hwt 0]], socket 1[core 36[hwt 0]], socket 1[core 37[hwt 0]], socket 1[core 38[hwt 0]], socket 1[core 39[hwt 0]], socket 1[core 40[hwt 0]], socket 1[core 41[hwt 0]], socket 1[core 42[hwt 0]], socket 1[core 43[hwt 0]], socket 1[core 44[hwt 0]], socket 1[core 45[hwt 0]], socket 1[core 46[hwt 0]], socket 1[core 47[hwt 0]]: [./././././././././././././././././././././././.][B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B]
[gadi-cpu-clx-2697.gadi.nci.org.au:594369] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]], socket 0[core 12[hwt 0]], socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt 0]], socket 0[core 16[hwt 0]], socket 0[core 17[hwt 0]], socket 0[core 18[hwt 0]], socket 0[core 19[hwt 0]], socket 0[core 20[hwt 0]], socket 0[core 21[hwt 0]], socket 0[core 22[hwt 0]], socket 0[core 23[hwt 0]]: [B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././././././././././././././.]
[gadi-cpu-clx-2697.gadi.nci.org.au:594369] MCW rank 3 bound to socket 1[core 24[hwt 0]], socket 1[core 25[hwt 0]], socket 1[core 26[hwt 0]], socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 0]], socket 1[core 30[hwt 0]], socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 0]], socket 1[core 34[hwt 0]], socket 1[core 35[hwt 0]], socket 1[core 36[hwt 0]], socket 1[core 37[hwt 0]], socket 1[core 38[hwt 0]], socket 1[core 39[hwt 0]], socket 1[core 40[hwt 0]], socket 1[core 41[hwt 0]], socket 1[core 42[hwt 0]], socket 1[core 43[hwt 0]], socket 1[core 44[hwt 0]], socket 1[core 45[hwt 0]], socket 1[core 46[hwt 0]], socket 1[core 47[hwt 0]]: [./././././././././././././././././././././././.][B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B]

# In bash
$ export I_MPI_DEBUG=5
$ export OMP_NUM_THREADS=24
$ export KMP_AFFINITY="explicit,proclist=[0-47]"
$ export PPN=$( grep -c $HOSTNAME $PBS_NODEFILE )
$ uniq < $PBS_NODEFILE > hosts.txt
$ mpirun -np $(( $PBS_NCPUS / $OMP_NUM_THREADS )) -ppn $(( $PPN / $OMP_NUM_THREADS )) -f hosts.txt ./my_program
 
# In csh/tcsh
$ setenv I_MPI_DEBUG 5
$ setenv OMP_NUM_THREADS 24
$ setenv KMP_AFFINITY "explicit,proclist=[0-47]"
$ setenv PPN=`grep -c $HOSTNAME $PBS_NODEFILE`
$ @ nprocs = ( $PBS_NCPUS / $OMP_NUM_THREADS )
$ @ ppn = (  $PPN / $OMP_NUM_THREADS )
$ uniq < $PBS_NODEFILE > hosts.txt
$ mpirun -np $nprocs -ppn $ppn -f hosts.txt ./my_program

[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 261060 gadi-cpu-clx-0278.gadi.nci.org.au {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23}
[0] MPI startup(): 1 261061 gadi-cpu-clx-0278.gadi.nci.org.au {24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47}
[0] MPI startup(): 2 346600 gadi-cpu-clx-0280.gadi.nci.org.au {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23}
[0] MPI startup(): 3 346601 gadi-cpu-clx-0280.gadi.nci.org.au {24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47}

The corresponding placement of processes and threads is given in the table below:

Gadi Cascade Lake Node 0 (hwt 0)

Socket 0

Socket 1

Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
proc0proc1
thread0thread1thread2thread3thread4thread5thread6thread7thread8thread9thread10thread11thread12thread13thread14thread15thread16thread17thread18thread19thread20thread21thread22thread23thread24thread25thread26thread27thread28thread29thread30thread31thread32thread33thread34thread35thread36thread37thread38thread39thread40thread41thread42thread43thread44thread45thread46thread47
Gadi Cascade Lake Node 1 (hwt 0)
Socket 0Socket 1
Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
proc2proc3
thread0thread1thread2thread3thread4thread5thread6thread7thread8thread9thread10thread11thread12thread13thread14thread15thread16thread17thread18thread19thread20thread21thread22thread23thread24thread25thread26thread27thread28thread29thread30thread31thread32thread33thread34thread35thread36thread37thread38thread39thread40thread41thread42thread43thread44thread45thread46thread47

8 MPI processes, 96 OpenMP threads across 96 CPUs: 1 process per numa node, 12 threads per process 

Similar to the previous example, 96 OpenMP threads are created across two nodes. However, in this example, there are 8 MPI processes with 12 threads per process. Note that the mpirun command is identical to the previous example for both Open MPI and Intel MPI.

 # In bash
$ export OMP_NUM_THREADS=12
$ export GOMP_CPU_AFFINITY=0-47
$ mpirun -np $(( $PBS_NCPUS / $OMP_NUM_THREADS )) --map-by node:PE=$OMP_NUM_THREADS --rank-by core --report-bindings ./my_program
 
# In csh/tcsh
$ setenv OMP_NUM_THREADS 12
$ setenv GOMP_CPU_AFFINITY 0-47
$ @ nprocs = ( $PBS_NCPUS / $OMP_NUM_THREADS )
$ mpirun -np $nprocs --map-by node:$OMP_NUM_THREADS --rank-by core --report-bindings ./my_program

[gadi-cpu-clx-2671.gadi.nci.org.au:227819] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]: [B/B/B/B/B/B/B/B/B/B/B/B/./././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2671.gadi.nci.org.au:227819] MCW rank 1 bound to socket 0[core 12[hwt 0]], socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt 0]], socket 0[core 16[hwt 0]], socket 0[core 17[hwt 0]], socket 0[core 18[hwt 0]], socket 0[core 19[hwt 0]], socket 0[core 20[hwt 0]], socket 0[core 21[hwt 0]], socket 0[core 22[hwt 0]], socket 0[core 23[hwt 0]]: [././././././././././././B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././././././././././././././.]
[gadi-cpu-clx-2671.gadi.nci.org.au:227819] MCW rank 2 bound to socket 1[core 24[hwt 0]], socket 1[core 25[hwt 0]], socket 1[core 26[hwt 0]], socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 0]], socket 1[core 30[hwt 0]], socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 0]], socket 1[core 34[hwt 0]], socket 1[core 35[hwt 0]]: [./././././././././././././././././././././././.][B/B/B/B/B/B/B/B/B/B/B/B/./././././././././././.]
[gadi-cpu-clx-2671.gadi.nci.org.au:227819] MCW rank 3 bound to socket 1[core 36[hwt 0]], socket 1[core 37[hwt 0]], socket 1[core 38[hwt 0]], socket 1[core 39[hwt 0]], socket 1[core 40[hwt 0]], socket 1[core 41[hwt 0]], socket 1[core 42[hwt 0]], socket 1[core 43[hwt 0]], socket 1[core 44[hwt 0]], socket 1[core 45[hwt 0]], socket 1[core 46[hwt 0]], socket 1[core 47[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././B/B/B/B/B/B/B/B/B/B/B/B]
[gadi-cpu-clx-2697.gadi.nci.org.au:594455] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]: [B/B/B/B/B/B/B/B/B/B/B/B/./././././././././././.][./././././././././././././././././././././././.]
[gadi-cpu-clx-2697.gadi.nci.org.au:594455] MCW rank 5 bound to socket 0[core 12[hwt 0]], socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt 0]], socket 0[core 16[hwt 0]], socket 0[core 17[hwt 0]], socket 0[core 18[hwt 0]], socket 0[core 19[hwt 0]], socket 0[core 20[hwt 0]], socket 0[core 21[hwt 0]], socket 0[core 22[hwt 0]], socket 0[core 23[hwt 0]]: [././././././././././././B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././././././././././././././.]
[gadi-cpu-clx-2697.gadi.nci.org.au:594455] MCW rank 6 bound to socket 1[core 24[hwt 0]], socket 1[core 25[hwt 0]], socket 1[core 26[hwt 0]], socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 0]], socket 1[core 30[hwt 0]], socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 0]], socket 1[core 34[hwt 0]], socket 1[core 35[hwt 0]]: [./././././././././././././././././././././././.][B/B/B/B/B/B/B/B/B/B/B/B/./././././././././././.]
[gadi-cpu-clx-2697.gadi.nci.org.au:594455] MCW rank 7 bound to socket 1[core 36[hwt 0]], socket 1[core 37[hwt 0]], socket 1[core 38[hwt 0]], socket 1[core 39[hwt 0]], socket 1[core 40[hwt 0]], socket 1[core 41[hwt 0]], socket 1[core 42[hwt 0]], socket 1[core 43[hwt 0]], socket 1[core 44[hwt 0]], socket 1[core 45[hwt 0]], socket 1[core 46[hwt 0]], socket 1[core 47[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././B/B/B/B/B/B/B/B/B/B/B/B]

# In bash
$ export I_MPI_DEBUG=5
$ export I_MPI_PIN_ORDER=range
$ export OMP_NUM_THREADS=12
$ export KMP_AFFINITY="explicit,proclist=[0-47]"
$ export PPN=$( grep -c $HOSTNAME $PBS_NODEFILE )
$ uniq < $PBS_NODEFILE > hosts.txt
$ mpirun -np $(( $PBS_NCPUS / $OMP_NUM_THREADS )) -ppn $(( $PPN / $OMP_NUM_THREADS )) -f hosts.txt ./my_program
 
# In csh/tcsh
$ setenv I_MPI_DEBUG 5
$ setenv I_MPI_PIN_ORDER range
$ setenv OMP_NUM_THREADS 12
$ setenv KMP_AFFINITY "explicit,proclist=[0-47]"
$ setenv PPN=`grep -c $HOSTNAME $PBS_NODEFILE`
$ @ nprocs = ( $PBS_NCPUS / $OMP_NUM_THREADS )
$ @ ppn = (  $PPN / $OMP_NUM_THREADS )
$ uniq < $PBS_NODEFILE > hosts.txt
$ mpirun -np $nprocs -ppn $ppn -f hosts.txt ./my_program

[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 256979 gadi-cpu-clx-1198.gadi.nci.org.au {0,1,2,3,4,5,6,7,8,9,10,11}
[0] MPI startup(): 1 256980 gadi-cpu-clx-1198.gadi.nci.org.au {12,13,14,15,16,17,18,19,20,21,22,23}
[0] MPI startup(): 2 256981 gadi-cpu-clx-1198.gadi.nci.org.au {24,25,26,27,28,29,30,31,32,33,34,35}
[0] MPI startup(): 3 256982 gadi-cpu-clx-1198.gadi.nci.org.au {36,37,38,39,40,41,42,43,44,45,46,47}
[0] MPI startup(): 4 255182 gadi-cpu-clx-1202.gadi.nci.org.au {0,1,2,3,4,5,6,7,8,9,10,11}
[0] MPI startup(): 5 255183 gadi-cpu-clx-1202.gadi.nci.org.au {12,13,14,15,16,17,18,19,20,21,22,23}
[0] MPI startup(): 6 255184 gadi-cpu-clx-1202.gadi.nci.org.au {24,25,26,27,28,29,30,31,32,33,34,35}
[0] MPI startup(): 7 255185 gadi-cpu-clx-1202.gadi.nci.org.au {36,37,38,39,40,41,42,43,44,45,46,47}

The corresponding placement of processes and threads is given in the table below:

Gadi Cascade Lake Node 0 (hwt 0)

Socket 0

Socket 1

Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
proc0proc1proc2proc3
thread0thread1thread2thread3thread4thread5thread6thread7thread8thread9thread10thread11thread12thread13thread14thread15thread16thread17thread18thread19thread20thread21thread22thread23thread24thread25thread26thread27thread28thread29thread30thread31thread32thread33thread34thread35thread36thread37thread38thread39thread40thread41thread42thread43thread44thread45thread46thread47
Gadi Cascade Lake Node 1 (hwt 0)
Socket 0Socket 1
Numa 0Numa 1Numa 2Numa 3

core0

core1

core2

core3

core4

core5

core6

core7

core8core9core10core11core12core13core14core15core16core17core18core19core20core21core22core23core24core25core26core27core28core29core30core31core32core33core34core35core36core37core38core39core40core41core42core43core44core45core46core47
proc4proc5proc6proc7
thread0thread1thread2thread3thread4thread5thread6thread7thread8thread9thread10thread11thread12thread13thread14thread15thread16thread17thread18thread19thread20thread21thread22thread23thread24thread25thread26thread27thread28thread29thread30thread31thread32thread33thread34thread35thread36thread37thread38thread39thread40thread41thread42thread43thread44thread45thread46thread47

Process and Thread Affinity


As mentioned before, unless process and/or thread affinity is pre-defined in code or using environment variables, it may be controlled by several different entities such as the MPI/OpenMP runtime or NUMA library.

Without controlling affinity, threads may be placed on the same core. Relevant discussion at http://stackoverflow.com/questions/17604867/processor-socket-affinity-in-openmpi.

When a PBS job is scheduled with -l other=hyperthreadit is possible for both hardware threads hwt 0 and hwt 1 to be used by processes and threads created during the job. 

As for example, when distributing 48 OpenMP threads across a single node while having 4 MPI processes with 12 threads per process, the following cases may be possible:

4 MPI processes, 48 OpenMP threads across 48 CPUs: 2 processes per socket, 12 threads per process 

 # In bash
$ export OMP_NUM_THREADS=12
$ export GOMP_CPU_AFFINITY=0-47
$ export NUM_NON_HYPER_THREADS=$(( $OMP_NUM_THREADS / 2 ))
$ mpirun -np $(( $PBS_NCPUS / $OMP_NUM_THREADS )) --map-by socket:PE=$NUM_NON_HYPER_THREADS --rank-by numa --report-bindings ./my_program
 
# In csh/tcsh
$ setenv OMP_NUM_THREADS 12
$ setenv GOMP_CPU_AFFINITY 0-47
$ @ NUM_NON_HYPER_THREADS = ( $OMP_NUM_THREADS / 2 )
$ @ nprocs = ( $PBS_NCPUS / $OMP_NUM_THREADS )
$ mpirun -np $nprocs --map-by socket:$NUM_NON_HYPER_THREADS --rank-by numa --report-bindings ./my_program

In this case, proc0 creates 12 threads which map to both hwt 0 and hwt 1 on cores 0 to 5. This repeats for each process. Clearly, a better distribution is possible since in this scenario, 24 OpenMP threads are running on 12 physical CPU cores and therefore incurring a hyperthread context switching overhead of two threads per core. 

[gadi-cpu-clx-0207.gadi.nci.org.au:312672] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-0207.gadi.nci.org.au:312672] MCW rank 1 bound to socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]]: [../../../../../../BB/BB/BB/BB/BB/BB/../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-0207.gadi.nci.org.au:312672] MCW rank 2 bound to socket 1[core 24[hwt 0-1]], socket 1[core 25[hwt 0-1]], socket 1[core 26[hwt 0-1]], socket 1[core 27[hwt 0-1]], socket 1[core 28[hwt 0-1]], socket 1[core 29[hwt 0-1]]: [../../../../../../../../../../../../../../../../../../../../../../../..][BB/BB/BB/BB/BB/BB/../../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-0207.gadi.nci.org.au:312672] MCW rank 3 bound to socket 1[core 30[hwt 0-1]], socket 1[core 31[hwt 0-1]], socket 1[core 32[hwt 0-1]], socket 1[core 33[hwt 0-1]], socket 1[core 34[hwt 0-1]], socket 1[core 35[hwt 0-1]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../BB/BB/BB/BB/BB/BB/../../../../../../../../../../../..]

The same distribution as above is possible in other ways when MPI process affinity to sockets is not defined.

# In bash
$ export OMP_NUM_THREADS=12
$ export GOMP_CPU_AFFINITY=0-47
$ export NUM_NON_HYPER_THREADS=$(( $OMP_NUM_THREADS / 2 ))
$ mpirun -np $(( $PBS_NCPUS / $OMP_NUM_THREADS )) --map-by socket:PE=$NUM_NON_HYPER_THREADS --rank-by slot --report-bindings ./my_program
 
# In csh/tcsh
$ setenv OMP_NUM_THREADS 12
$ setenv GOMP_CPU_AFFINITY 0-47
$ @ NUM_NON_HYPER_THREADS = ( $OMP_NUM_THREADS / 2 )
$ @ nprocs = ( $PBS_NCPUS / $OMP_NUM_THREADS )
$ mpirun -np $nprocs --map-by socket:$NUM_NON_HYPER_THREADS --rank-by slot --report-bindings ./my_program

In this case proc1 put on the next socket as shown below. 

[gadi-cpu-clx-0207.gadi.nci.org.au:312723] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-0207.gadi.nci.org.au:312723] MCW rank 1 bound to socket 1[core 24[hwt 0-1]], socket 1[core 25[hwt 0-1]], socket 1[core 26[hwt 0-1]], socket 1[core 27[hwt 0-1]], socket 1[core 28[hwt 0-1]], socket 1[core 29[hwt 0-1]]: [../../../../../../../../../../../../../../../../../../../../../../../..][BB/BB/BB/BB/BB/BB/../../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-0207.gadi.nci.org.au:312723] MCW rank 2 bound to socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]]: [../../../../../../BB/BB/BB/BB/BB/BB/../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../../../../../..]
[gadi-cpu-clx-0207.gadi.nci.org.au:312723] MCW rank 3 bound to socket 1[core 30[hwt 0-1]], socket 1[core 31[hwt 0-1]], socket 1[core 32[hwt 0-1]], socket 1[core 33[hwt 0-1]], socket 1[core 34[hwt 0-1]], socket 1[core 35[hwt 0-1]]: [../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../BB/BB/BB/BB/BB/BB/../../../../../../../../../../../..]

Authors: Mohsin Ali
  • No labels