Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Panel
borderColor#21618C
bgColor#ABEBC6
borderWidth1
borderStyleridge

 Do

Panel
borderColor#21618C
bgColor#F6F7F7
borderWidth1
borderStyleridge
Expand
titleSubmit large data transfers to copyq

Large data transfers, roughly 500 GiB and above, should be put through the copyq queues, to prevent overworking the login nodes and impacting other users.

Expand
titleMove files into g/data and massdata out of scratch

Scratch is intended to be used as the main computing space for your compute needs. However, once your jobs have run, please move your data to a different directory. Files left in scratch for 100 days will be quarantined and potentially deleted permanently. 

Expand
titleSearch for the sweet spot in your compute needs

Run tests on your submission scripts. You should aim to run you tasks around where the job benefits from parallelism and achieves a shorter execution time, while aiming to utilise a minimum of 80% of the requested compute capacity. 

While searching for the sweet spot, please be aware that it is common to see components in a task that run only on a single core and cannot be parallelised. These sequential parts drastically limit the parallel performance.

For example, having 1% sequential parts in a certain workload limits the overall CPU utilisation rate of the job when running in parallel on 48 cores to less than 70%. Moreover, parallelism adds overhead which in general scales up with the increasing core count and, when beyond the ‘sweet spot’, results in a waste of time on unnecessary task coordination.

A way to test this would be to limit the wall time of your job ot a very low value while in the testing phase. This allows you to test without changing parameters that would affect the jobs final run results. 

Expand
titleUse copyq if your job needs access to the internet

If your job needs access to the internet at any stage of its life, it will need to be submitted to the copyq queues, as these are the only queues with external internet access. 



Panel
borderColor#21618C
bgColor#ff0022
borderWidth1
borderStyleridge

Don't

Panel
borderColor#21618C
bgColor#F6F7F7
borderWidth1
borderStyleridge
Expand
titleDon't run jobs on the log in nodes

The login nodes aren't intended to be used to run jobs. Doing so will impact the efficiency of these nodes and take resources away from other users. 

Expand
titleTransfer large amounts of data through the log in nodes

Large data transfers should be put through the copyq queues, to prevent overworking the login nodes and impacting other users.

Expand
titleKeep files in scratch

Scratch is intended to be used as the main computing space for your compute needs. However, once your jobs have run, please move your data to a different directory. Files left in scratch for 100 days will be quarantined and potentially deleted permanently. 

Expand
titleRequest resources that you won't need

Don't request resources that you won't need, it will only result in your job and other users jobs being held up. The PBS scheduler will find time for 2 cpus faster than 4 cpus, so really think about how many resources you are requesting. 

Expand
titleDon't run checks on your job constantly

Repeatedly checking the status of your job will be detected as a malicious attack. Checking now and then is fine but please limit the amount of times you query the job, especially in quick succession. Please wait at least 10 minutes before queerying your job again. 


Authors: Andrew Johnston