Page tree

On this page

Overview

Python is an interpreted, interactive, object-oriented programming language.

More information: https://www.python.org/

Usage

You can check the versions installed system-wide or as modules in Gadi with the following commands:

$ python2 --version

or

$ python3 --version

or

$ module avail python

or

$ module avail intel-python

We normally recommend using the latest version available and always recommend to specify the version number with the module command:

$ module load python3/3.9.2

For more details on using modules see our modules help guide at https://opus.nci.org.au/display/Help/Environment+Modules.

Python versions installed in Gadi include the following packages:

  • Numpy
  • Scipy
  • Matplotlib
  • Ipython
  • Cython
  • Pip

We expect that other packages will be installed by users in their own directory in /g/data or /home file systems. Numpy and Scipy have been built with Intel MKL to make it run faster.

Note that there is also python2 and python3 installed in /bin directory. These Python installations come with the system and are old versions. Normally, we do not recommend using these versions to build packages, but they are alright to run simple Python scripts.

Serial Job

An example PBS job submission script named python_serial_job.sh is provided below. It requests 1 CPU core, 2 GiB memory, and 8 GiB local disk on a compute node on Gadi from the normal queue for its exclusive access for 30 minutes against the project a00. It also requests the system to enter the working directory once the job is started. This script should be saved in the working directory from which the analysis will be done. To change walltime, memory, or jobfs required, simply modify the appropriate PBS resource requests at the top of this file according to the information available at https://opus.nci.org.au/display/Help/Queue+Structure.

#!/bin/bash

#PBS -P a00
#PBS -q normal
#PBS -l ncpus=1
#PBS -l mem=2GB
#PBS -l jobfs=8GB
#PBS -l walltime=00:30:00
#PBS -l wd

# Load module, always specify version number.
module load python3/3.9.2

# Either update your PYTHONPATH variable if some packages are
# installed under your non-HOME directory or activate your virtual
# environment if some packages are installed under the virtual
# environment.

# Set number of OMP threads
export OMP_NUM_THREADS=$PBS_NCPUS

# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on:
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained

# Run Python applications
python3 python_serial_script.py > $PBS_JOBID.log

# Deactivate virtual environment, if any.

To run the job you would use the PBS command:

$ qsub python_serial_job.sh

Parallel Job

An example PBS job submission script named python_parallel_job.sh is provided below. It requests 48 CPU cores, 128 GiB memory, and 400 GiB local disk on a compute node on Gadi from the normal queue for its exclusive access for 30 minutes against the project a00. It also requests the system to enter the working directory once the job is started. This script should be saved in the working directory from which the analysis will be done. To change the number of CPU cores, walltime, memory, or jobfs required, simply modify the appropriate PBS resource requests at the top of this file according to the information available at https://opus.nci.org.au/display/Help/Queue+Structure. Note that if your application does not work in parallel, setting the number of CPU cores to 1 and changing the memory and jobfs accordingly is required to prevent the compute resource waste.

#!/bin/bash

#PBS -P a00
#PBS -q normal
#PBS -l ncpus=48
#PBS -l mem=128GB
#PBS -l jobfs=400GB
#PBS -l walltime=00:30:00
#PBS -l wd

# Load modules, always specify version number.
module load python3/3.9.2
module load openmpi/4.0.2

# Either update your PYTHONPATH variable if some packages are
# installed under your non-HOME directory or activate your virtual
# environment if some packages are installed under the virtual
# environment.

# Set number of OMP threads
export OMP_NUM_THREADS=$PBS_NCPUS

# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on:
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained

# Run Python applications
mpirun -np $PBS_NCPUS python3 python_parallel_script.py > $PBS_JOBID.log

# Deactivate virtual environment, if any.

To run the job you would use the PBS command:

$ qsub python_parallel_job.sh

Python Package Installation

General consideration:

  1. We recommend using system gcc to compile packages. If you have Intel compilers loaded, unload them before doing the installation.
  2. We strongly recommend not to install binary packages, please compile everything on Gadi. This will make sure that the package is working properly in our environment. Some binary installations will not work properly on Gadi.

Installing Python Packages under HOME Directory

If you want to install Python packages under the HOME directory, follow the below procedures:

# Unload modules
$ module unload intel-compiler intel-mkl python python2 python3 openmpi hdf5

# Load modules, always specify version number.
$ module load python3/3.9.2
$ module load openmpi/4.0.2
$ module load hdf5/1.10.5

$ python3 -m pip install -v --no-binary :all: --user mpi4py
$ python3 -m pip install -v --no-binary :all: --user h5py
$ python3 -m pip install -v --no-binary :all: --user requests
$ python3 -m pip install -v --no-binary :all: --user google-auth

The above will install mpi4pyh5pyrequests and google-auth packages in the ~/.local/lib/python3.9/site-packages directory under your HOME directory.

In order to use the above installed Python packages under the HOME directory, do the following:

# Load modules, always specify version number.
$ module load python3/3.9.2
$ module load openmpi/4.0.2

# Run your Python script
$ mpirun -np 2 python3 /path/to/my/python/script.py

Installing Python Packages under non-HOME Directory

If you want to install Python packages in the gdata directory, say /g/data/your/directory/name, follow the below procedures:

# Unload modules
$ module unload intel-compiler intel-mkl python python2 python3 openmpi hdf5

# Load modules, always specify version number.
$ module load python3/3.9.2
$ module load openmpi/4.0.2
$ module load hdf5/1.10.5

# Update your PYTHONPATH variable
$ export PYTHONPATH=/g/data/your/directory/name/lib/python3.9/site-packages:$PYTHONPATH

$ python3 -m pip install -v --no-binary :all: --prefix=/g/data/your/directory/name mpi4py
$ python3 -m pip install -v --no-binary :all: --prefix=/g/data/your/directory/name h5py
$ python3 -m pip install -v --no-binary :all: --prefix=/g/data/your/directory/name requests
$ python3 -m pip install -v --no-binary :all: --prefix=/g/data/your/directory/name google-auth

The above will install mpi4py, h5pyrequests and google-auth packages in the /g/data/your/directory/name/lib/python3.9/site-packages directory under the /g/data directory.

In order to use the above installed Python packages under a non-HOME directory, do the following:

# Load modules, always specify version number.
$ module load python3/3.9.2
$ module load openmpi/4.0.2

# Update your PYTHONPATH variable
$ export PYTHONPATH=/g/data/your/directory/name/lib/python3.9/site-packages:$PYTHONPATH

# Run your Python script
$ mpirun -np 2 python3 /path/to/my/python/script.py

Installing Python Packages under Virtual Environment

It is recommended to always use a virtual environment for any Python applications. It allows you to manage separate Python environments for different projects. As it uses an isolated Python environment for each project, you can easily switch the projects without worrying about breaking the Python packages installed.

If you want to install Python packages under a virtual environment, follow the below procedures:

# Unload modules
$ module unload intel-compiler intel-mkl python python2 python3 openmpi hdf5

# Load modules, always specify version number.
$ module load python3/3.9.2
$ module load openmpi/4.0.2
$ module load hdf5/1.10.5

# Create a virtual environment from python3/3.9.2, if already not created.
$ python3 -m venv --system-site-packages /path/to/my/virtual/environment/my_venv_name_3.9.2

# Activate your virtual environment
$ source /path/to/my/virtual/environment/my_venv_name_3.9.2/bin/activate

# Install Python packages under the activated virtual environment
$ python3 -m pip install -v --no-binary :all: mpi4py
$ python3 -m pip install -v --no-binary :all: h5py
$ python3 -m pip install -v --no-binary :all: requests
$ python3 -m pip install -v --no-binary :all: google-auth

# Deactivate the virtual environment
$ deactivate

The above will install mpi4py, h5pyrequests and google-auth packages in the /path/to/my/virtual/environment/my_venv_name_3.9.2/lib/python3.9/site-packages directory under the virtual environment.

In order to use the above installed Python packages under the virtual environment, do the following:

# Load modules, always specify version number.
$ module load python3/3.9.2
$ module load openmpi/4.0.2

# Activate your virtual environment
$ source /path/to/my/virtual/environment/my_venv_name_3.9.2/bin/activate

# Run your Python script
$ mpirun -np 2 python3 /path/to/my/python/script.py

# Deactivate the virtual environment
$ deactivate

Binary Package Installation

Authors for some packages do not provide source code. In this case, pip installations with --no-binary :all: keyword will not work and you will see a message telling that no compatible Python module is found. In these cases, omit --no-binary :all: keyword from the pip command. This will install a binary version of the package.

Python2 Package Installation

Procedure for python2 is the same, but use the recent python2 version on Gadi and use python2 -m pip instead of python3 -m pip command.