This tutorial is designed to show you how to download, compile, and run a simple job on Gadi, allowing you to practice submission before running your own binary.
In this instance you will be running a simple 'Hello world' job, which will teach you the basics of building and submitting a job, while also showing you some tips on monitoring the job as it runs.
For the majority of jobs, you can run a similar workflow to this tutorial, that being:
To begin, open your prefered SSH shell and log into gadi.
Once you are logged in, you will need to download the program, this can be done by clicking this link, hello_mpi.c
You will need to transfer this file into your home drive, for guidance on this, please see our file transfer guide.
To build and compile this program into something that can run on Gadi, we will be using the command mpicc
, which is used to compile mpi programs written in C
. We will also be using openmpi
to help with the compilation.
To do so, we need to load the openmpi module
into the environment. You can use, our software applications guide, for an indepth look into loading and unloading software, but for now we will run the command
$ module avail openmpi
OpenMPI is an open-source message passing interface (MPI) that assists with running parallel jobs on a supercomputer.
As you can see on the right, the 'module avail openmpi
' command lists all of the available versions of openmpi on Gadi. NCI also recommends that you use a version that you know is compatible with your binary, limiting the risk of unnecessary errors. In this case, we will use the latest version of openmpi
by using the command
$ module load openmpi/4.1.5
You can check what modules you have loaded into the environment by running the command
$ module list
You should see that openmpi
has been loaded, along with PBS,
which is automatically loaded upon login.
Once successfully loaded, we can now begin compiling the job, to do this run the command
$ mpicc -o hello_mpi hello_mpi.c
This will compile a binary file for you, named hello_mpi
, which will be created in your home drive. If you want to check that it was produced successfully, you can run
$ ls
On the right you can see the original file, hello_mpi.c
,
and the new file, hello_mpi
. This is the file you will use to run your job, and will be referenced when producing a PBS script for the job.
To run jobs on Gadi, users need to create a PBS submission script. A PBS script is a list of parameters that will tell Gadi what resources you want to use while running the job. You can outline how much walltime your job will take, along with the CPU's required, and how much memory to allocate to it. You can read our job submission guide for an indepth look at how to write submission scripts, along with our PBS environment guide for useful ways to customise your scripts.
For this job we will be designing a very simple script. This can be written in whichever editor you wish, e.g. vim, nano, etc, and contains simple lines that tell PBS what how you want to run your job. We will use vim in this case and you can open it by simply running the command
$ vim
For people new to Linux and text editors in general, you can get a great break down of vim and its uses, including short lessons, by running
$ vimtutor
Once vim is open, you can start preparing your script, it should look similar to this simple outline below.
#!/bin/bash #PBS -P <Project code> #PBS -q normal #PBS -l ncpus=4 #PBS -l mem=16gb #PBS -l walltime=00:10:00 module load openmpi/4.1.5 mpirun hello_mpi
These parameters tell the PBS scheduler how you would like to run your job, in this case:
To save your script, enter :wq script.sh
to write(w
), quit(q
) and name your file script.sh
. This will save your script and send you back to the your home directory, where you will be ready to submit your job to Gadi.
You've now compiled your job and written your job script, it's time to submit your job to the PBS scheduler.
Jobs are submitted using the command qsub
. For this job, we can run the command
$ qsub script.sh
Which will submit the job and give you a jobID, as you can see below
You can now begin monitoring and get data on what is happening while it is running on Gadi.
There are several methods to monitoring jobs and analyse data, for this tutorial, we will focus on one of the simpler methods. For a more indepth look at ways to monitor jobs, please see our job monitoring page.
To monitor your job while it is running, you can enter the command
$ qstat -swx <jobID>
Which will print information like this
As you can see in this screenshot, the job entered the queue and finished, and as it is a very small job, it took only one second to complete
This will produce two new files in your home directory, with a filename script.sh.o<JobID>
and script.sh.e<JobID>.
The first file, with 'o
' in the file name, is the output of the job, which should have run successfully on Gadi. The second file, with 'e' in the file name, is the error stream. this will document any errors that occured while the job was running. In this case, the error stream file should be empty.
The output file can be opened by running the command
$ less script.sh.o<JobID>
Giving you this information
This shows the job reporting back 'Hello
' from the 4 CPUs that we requested, along with other information that will be useful when running your own jobs. If your job script asked for more than four CPUs, they would all be echoing 'Hello
' along with the CPUs that you see here.
You have run your first job on Gadi!
Although it might seem small, it is a great stepping stone to learning more about high performance computing.