Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Three major binary executables of the package are dftb+, waveplot, and modes. There are also some auxiliary binaries, see content of $DFT_BASE/bin. Most of binaries are provided in two versions: OMP version (files with no file extension) and hybrid MPI-OMP version ( files with .mpi extension). The hybrid MPI-OMP regime is preferred for large multi-node calculations on gadi. The major dftb+ binary has  also the third version (dftb+.mpi-elsi), which is MPI version linked to ELSI 2.7.1 library providing support to additional eigensolvers (ELPA , OMM , PEXSI and NTPoly). Since ELSI does not support OMP, dftb+.mpi-elsi is pure MPI binary, therefore OMP_NUM_THREADS=1 must be used in jobs using dftb+.mpi-elsi. All three versions of the dftb+ binary were built with support of transport calculations. The OMP version of executable (binary dftb+) is supportive of GPU calculation via MAGMA eigensolver.

To facilitate the use of the binaries for beginners, we offer an auxiliary script file run.sh which decides what version of binary (OMP or MPI-OMP) to run, set all OMP and MPI environment settings up based on your PBS number of requested CPUs. However, some parallel options must be provided via the input file. We leave it on user to provide this options when necessary. If you find the  MPI settings for the job (can be seen  in first lines of the job log)  is not what you want, you can make your own settings via direct use of mpirun command, i.e. without use of the "run.sh script".

...

Code Block
languagebash
#!/bin/bash

#PBS -P a99
#PBS -l ncpus=16
#PBS -l mem=16GB
#PBS -l jobfs=1GB
#PBS -l walltime=01:00:00
#PBS -l wd

# Load module, always specify version number.
module load dftbplus/21.1

# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on:
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained

run.sh dftb+ > output

The input file 'dftb_in.bsd' must be located in the directory from which the job has been submitted.  To submit the job to the queuing system:

Code Block
languagebash
$ qsub dftb+.pbs

The dftb+.mpi-elsi binary  (DFTB+ executable binary with ELSI eigensolver, MPI-enabled) can be employed via standard mpirun command. i.e.

Code Block
languagebash
#!/bin/bash

#PBS -P a99
#PBS -l ncpus=16
#PBS -l mem=16GB
#PBS -l jobfs=1GB
#PBS -l walltime=01:00:00
#PBS -l wd

# Load module, always specify version number.
module load dftbplus/21.1

# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on:
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained

mpirun -np $PBS_NCPUS dftb+.mpi-elsi > output


The dftb+.mpi binary  (DFTB+ executable binary,  supportive of hybrid MPI-OpenMP parallelism, SDFTD3 and PLUMED) can be employed via  mpirun command which provide environment settings necessary to define partitioning of allocated CPUs between MPI ranks and number of OpenMP threads within each MPI rank . See example below.

Code Block
languagebash
#!/bin/bash

#PBS -P a99
#PBS -l ncpus=96
#PBS -l mem=92GB
#PBS -l jobfs=1GB
#PBS -l walltime=01:00:00
#PBS -l wd 

# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on:
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained

module load dftbplus/21.1
RANKS=$((PBS_NNODES*PBS_NCI_NUMA_PER_NODE))
NTHREADS=$((PBS_NCPUS/RANKS))
MPI_OPTIONS="-map-by node:SPAN,PE=$NTHREADS --bind-to core -x OMP_NUM_THREADS=NTHREADS -report-bindings"
RUN_EXE="dftb+.mpi"
RUN_CMD="mpirun -np $RANKS ${MPI_OPTIONS} ${RUN_EXE}" 

echo "Job started on ${HOSTNAME}"
echo "The MPI command is: ${RUN_CMD}" 

${RUN_CMD}

In the above  example of the submission script, we illustrate allocation of one NUMA node per MPI rank. In the normal PBS queue on gadi, each node of 48 cpus has 4 NUMA nodes with 12 cpus per NUMA node. Withing each NUMA node, the job will use the OpenMP parallelism. For the normal queue on gadi, in the example above, we get at the end RANKS=8 and OMP_NUM_THREADS=12. But it is up to user to decide on how many MPI ranks and OpenMP threads to use in a particular job. Some preliminary testings may be advised to find the MPI-OMP setup giving the best performance.

The input file must bear name dftb_in.hsd and . It requires to provide path to the Slater-Kostner parameter sets. A large number of Slater-Kostner parameter sets is available in directory directory /apps/dftbplus/slako/.

A sample input file, PBS submission script, and output files for DFTB+ and Waveplot programs are available in directory /apps/dftbplus/21.1/first-calc. Read file read_me inside the directory on protocol of running DFTB+ and Waveplot on NCI machines.

...