Page tree

On this page

Usage

It is the latest release of DFTB+. To use it, please load the appropriate dftbplus modulefile with the command

$ module load dftbplus/21.1

See here for more information on use of module command.

(writing of the page is in progress)

Three major binary executables of the package are dftb+, waveplot, and modes. There are also some auxiliary binaries, see content of $DFT_BASE/bin. Most of binaries are provided in two versions: OMP version (files with no file extension) and hybrid MPI-OMP version ( files with .mpi extension). The hybrid MPI-OMP regime is preferred for large multi-node calculations on gadi. The major dftb+ binary has  also the third version (dftb+.mpi-elsi), which is MPI version linked to ELSI 2.7.1 library providing support to additional eigensolvers (ELPA , OMM , PEXSI and NTPoly). Since ELSI does not support OMP, dftb+.mpi-elsi is pure MPI binary, therefore OMP_NUM_THREADS=1 must be used in jobs using dftb+.mpi-elsi. All three versions of the dftb+ binary were built with support of transport calculations. The OMP version of executable (binary dftb+) is supportive of GPU calculation via MAGMA eigensolver.

To facilitate the use of the binaries for beginners, we offer an auxiliary script file run.sh which decides what version of binary (OMP or MPI-OMP) to run, set all OMP and MPI environment settings up based on your PBS number of requested CPUs. However, some parallel options must be provided via the input file. We leave it on user to provide this options when necessary. If you find the  MPI settings for the job (can be seen  in first lines of the job log)  is not what you want, you can make your own settings via direct use of mpirun command, i.e. without use of the "run.sh script".

The script input arguments are:

%1 is the binary name, default is dftb+
%2 is the number of MPI threads. Default is equal to the number of the nodes.
%3 is the number of OMP threads per MPI thread. Default is lesser of the  number of cores per node and the number of requested cpus through PBS.

In most cases, the first argument (binary name) is enough. The script will set the number of MPI and OMP threads for you based on available PBS resources.

Here is an example of a parallel DFTB+ job run under PBS. The example file dftb+.pbs uses a fictitious project a99, asks for 16 cpus, 1 hour of wall clock time, 16GB of memory and 1GB od fast jobfs disk space.

#!/bin/bash

#PBS -P a99
#PBS -l ncpus=16
#PBS -l mem=16GB
#PBS -l jobfs=1GB
#PBS -l walltime=01:00:00
#PBS -l wd

# Load module, always specify version number.
module load dftbplus/21.1

# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on:
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained

dftb+ > output

The input file 'dftb_in.bsd' must be located in the directory from which the job has been submitted.  To submit the job to the queuing system:

$ qsub dftb+.pbs

The dftb+.mpi-elsi binary  (DFTB+ executable binary with ELSI eigensolver, MPI-enabled) can be employed via standard mpirun command. i.e.

#!/bin/bash

#PBS -P a99
#PBS -l ncpus=16
#PBS -l mem=16GB
#PBS -l jobfs=1GB
#PBS -l walltime=01:00:00
#PBS -l wd

# Load module, always specify version number.
module load dftbplus/21.1

# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on:
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained

mpirun -np $PBS_NCPUS dftb+.mpi-elsi > output


The dftb+.mpi binary  (DFTB+ executable binary,  supportive of hybrid MPI-OpenMP parallelism, SDFTD3 and PLUMED) can be employed via  mpirun command which provide environment settings necessary to define partitioning of allocated CPUs between MPI ranks and number of OpenMP threads within each MPI rank . See example below.

#!/bin/bash

#PBS -P a99
#PBS -l ncpus=96
#PBS -l mem=92GB
#PBS -l jobfs=1GB
#PBS -l walltime=01:00:00
#PBS -l wd 

# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on:
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained

module load dftbplus/21.1
RANKS=$((PBS_NNODES*PBS_NCI_NUMA_PER_NODE))
NTHREADS=$((PBS_NCPUS/RANKS))
MPI_OPTIONS="-map-by node:SPAN,PE=$NTHREADS --bind-to core -x OMP_NUM_THREADS=NTHREADS -report-bindings"
RUN_EXE="dftb+.mpi"
RUN_CMD="mpirun -np $RANKS ${MPI_OPTIONS} ${RUN_EXE}" 

echo "Job started on ${HOSTNAME}"
echo "The MPI command is: ${RUN_CMD}" 

${RUN_CMD}

In the above  example of the submission script, we illustrate allocation of one NUMA node per MPI rank. In the normal PBS queue on gadi, each node of 48 cpus has 4 NUMA nodes with 12 cpus per NUMA node. Withing each NUMA node, the job will use the OpenMP parallelism. For the normal queue on gadi, in the example above, we get at the end RANKS=8 and OMP_NUM_THREADS=12. But it is up to user to decide on how many MPI ranks and OpenMP threads to use in a particular job. Some preliminary testings may be advised to find the MPI-OMP setup giving the best performance.

The input file must bear name dftb_in.hsd. It requires to provide path to the Slater-Kostner parameter sets. A large number of Slater-Kostner parameter sets is available in directory /apps/dftbplus/slako/.

A sample input file, PBS submission script, and output files for DFTB+ and Waveplot programs are available in directory /apps/dftbplus/21.1/first-calc. Read file read_me inside the directory on protocol of running DFTB+ and Waveplot on NCI machines.

A number of python DFTB+ utilities to process DFTB+ results, is provided, see content of /apps/dftbplus/21.1/bin. The python utilities require preload of python/3.7.4.

The user's manual "manual.pdf" for DFTB+ and included utility programs can be found in directory $DFTBPLUS_ROOT/bin. A large set of documentation including the manual, recipes and tutorials is available online on the developers web-site.