Page History

Panel

borderColor	#21618C
bgColor	#F6F7F7
titleColor	#17202A
borderWidth	1
titleBGColor	#FFB96A
borderStyle	ridge
title	Job Monitoring

Jobs submitted to Gadi are given a jobID, this is shown to you as soon as it has been accepted and is a string of eight numbers, e.g. 12345678.

NCI encourages users to monitor their jobs at every stage, to monitor it's health and assist in detecting errors and failures.

However, please refrain from checking your jobs excessively. Repeated queries will be considered attacks, especially in quick succession. Our recommendation is to query your jobs status a maximum of once every 10 minutes, this should be more than enough.

Panel

borderColor	#21618C
bgColor	#F6F7F7
titleColor	#17202A
borderWidth	1
titleBGColor	#FFB96A
borderStyle	ridge
title	On this page

Panel

borderColor	#21618C
bgColor	#F6F7F7
borderWidth	1
borderStyle	ridge

Queue Status
Anchor
Queue status
Queue status

To query job status, users run the command

Code Block

theme	FadeToGrey

$ qstat -swx <jobID>

The command -swx is made up of:

-s: Summary format - shows queue level status rather than individual jobs

-w: Wide format - displays output in multiple columns

-x: Extended/Expanded format - includes additional details in the output

The screenshot below is in regards to job 12345678.gadi -pbs

User aaa777 submitted the job
To the normal-exec queue
They requested 48 cores and 190 GiB memory
It requested 2:00 hours and has been running for 0:35:21
The line at the bottom indicates when the job started, what Gadi node it is running on, 2697, and the space reserved on jobfs

You can go even further with this and run the command

Code Block

theme	FadeToGrey

$ qstat -fx <jobID> | less

In this case, -f means full/full information, by combining this with -x, we can print the complete job information.

If you would like to see a list of qstat commands and their functions, please check the qstat manual by running

Code Block

theme	FadeToGrey

$ man qstat

Panel

borderColor	#21618C
bgColor	#F6F7F7
titleColor	#17202A
borderWidth	1
titleBGColor	#FFB96A
borderStyle	ridge

CPU and Memory Utilisation
Anchor
CPU and Memory Utilisation
CPU and Memory Utilisation

Users should continue to monitor their jobs, especially the utilisation rate. If users run into errors, this will be evident in a drop in utilisation rate. While a low utilisation rate is helpful for spotting the underuse of compute time, a 100% utilisation rate doesn't necessarily indicate the most efficient use of requested resources. Further enquiries can be made to check if performance can be improved.

To see how much CPU and memory your job has actually been using, run the command

Code Block

theme	FadeToGrey

$ nqstat_anu <jobID>

This show us that the CPU ran at only 23% of the compute capacity of the 48 cores that were requested and that 36:47 has elapsed.

It also shows the peak memory usage in the columns RSS and MEM.

Depending on the tasks running within this job, the percentage may increase as its lifespan continues. NCI recommends that users aim for at least 80% overall CPU utilisation rate.

Anchor

	Commands
	Commands

Panel

borderColor	#21618C
bgColor	#F6F7F7
titleColor	#17202A
borderWidth	1
titleBGColor	#FFB96A
borderStyle	ridge
title	Other Useful Monitoring Commands

Process Status in Job

To monitor the status of processes taking place inside a job, you can take a snapshot of the process status of a job by running

Code Block

theme	FadeToGrey

$ qps 12345678

Files in Folder $PBS_JOBFS

To list the files contained in the folder $PBS_JOBFS on a compute node, you can do this from the login node by running the command

Code Block

theme	FadeToGrey

$ qls 12345678

To copy a file from $PBS_JOBFS into your current folder, you can use the command qcp, such as

Commands to help monitor you jobs
`man qstat`	View the manual for qstat and a range of helpful commands
qdel <jobid>	Delete the job with jobID <jobid>
qstat -swx <jobid>	Display the job status in the queue with comment
qstat -fx <jobid>	Display full job status information
qps <jobid>	Take a snapshot of the process status of all current processes in the running job
qcat [-s/-o/-e] <jobid>	Display [submission script/STDOUT/STDERR] of the running job
qls <jobid>	List contents in the folder $PBS_JOBFS
qcp <jobid> <dst>	Copy files and directories from the folder $PBS_JOBFS to the destination folder <dst>

Code Block

theme	FadeToGrey

$ qcp -n 0 12345678/testjob_outdir/job.timing ./job.timing.bk1

Page tree

Versions Compared

Old Version 61

New Version 62

Key

Queue status

CPU and Memory Utilisation

Other useful commands

Queue Status
Anchor
Queue status
Queue status

CPU and Memory Utilisation
Anchor
CPU and Memory Utilisation
CPU and Memory Utilisation

Yue Sun, Andrew Wellington, Mohsin Ali, Javed Shaikh,Adam Huttner-Koros, Andrew Johnston, El-Abed Haidar

El-Abed Haidar

Page tree

Page History

Versions Compared

Old Version 61

New Version 62

Key

Queue status

CPU and Memory Utilisation

Other useful commands

Queue Status AnchorQueue statusQueue status

CPU and Memory Utilisation AnchorCPU and Memory UtilisationCPU and Memory Utilisation

Yue Sun, Andrew Wellington, Mohsin Ali, Javed Shaikh,Adam Huttner-Koros, Andrew Johnston, El-Abed Haidar

El-Abed Haidar

Queue Status
Anchor
Queue status
Queue status

CPU and Memory Utilisation
Anchor
CPU and Memory Utilisation
CPU and Memory Utilisation