Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Contributed By 

Contributors

Gadi is Australia’s most powerful supercomputer, a highly parallel cluster comprising more than 150200,000 processor cores on ten different types of compute nodes. Gadi accommodates a wide range of tasks, from running climate models to genome sequencing, from designing molecules to astrophysical modelling. To start using Gadi, you should read this page which covers most of the basics you need to know before submitting your first job. 

...

Resource Name


Owner

Accessible from 

Size Limit

Allocation Valid Until

Resource Specific Comments

Compute Hoursprojectn.a.amount set by scheme managerend of quarter
storage

$HOME

user

PBS jobs / login nodes

10 GiB  with no possible extension 

user account deactivation

  • with backups in $HOME/.snapshot

/scratch/$PROJECT

project

PBS jobs† / login nodes

72 GiB by default, more on jobs' demand

project retirement/job demand changes
  • designed for jobs with large data IO
  • no backups
  • data expires in 90 days since creation [tbc when details available]
  • number-of-files limit applied

/g/data/$PROJECT

project

PBS jobs† / login nodes

amount set by scheme manager

project retirement 

  • designed for hosting persistent data
  • no backups
  • number-of-files limit applied
  • also accessible from other NCI services, like cloud

mdssmassdataprojectPBS copyq jobs† / login nodesamount set by scheme managerproject retirement tape-based archival data sto
  • two copies created in two different buildings
  • tape-based archival data storage

$PBS_JOBFSuserPBS jobs * disk space available on the job's hosting node(s)job termination
  • no backups
  • designed for jobs with frequent and small IO
software applications

NCI

PBS jobs / login nodes

n.a.

n.a.

  • built from source on Gadi when possible
  • more can be added on request ‡
license

software group owner

PBS jobs / login nodes

available seats on the licensing server

license expiry date

  • access controlled by software group membership ††
  • NCI owned licenses are for academic use only
  • projects, institutions and universities can bring in their own licenses 

...

This shows the total, used, reserved, and available compute grant of the project in the current billing period at the time of the query. SU is short for Service Unit, the unit that measures Gadi compute hours. Jobs run on the Gadi normal queue are charged 2 SUs to run for an hour on a single core with a memory allocation of up to 4 GiB. Jobs on Gadi are charged for the proportion of a node’s total memory that is used: see more examples of job cost here. In addition, different Gadi queues have different charge rates: see the breakdown of charge rates here.

The Grant amount listed is always equal to the sum of Used, Reserved and Avail. Every time a project has a job submitted, its potential cost, calculated according to the requested walltime, is reserved from the total Grant and reduces the amount of Avail. The actual cost based on walltime usage is determined only after the job's completion. If it finishes within the requested walltime limit, the over-reserved allocation is returned to the Avail amount and the actual cost goes to Used amount. When the project has no jobs that are waiting to run or running, Reserved SU returns to zero.

If there are not enough SUs available for a job to run according to its requested resource amounts, the job will be waiting in the queue indefinitely. The project lead CI should contact their allocation scheme manager to apply for more. If not sure which scheme manager to contact, see details in the verbose output of the command `nci_account`. It provides more granular information on per user and per stakeholder basis if receiving the `-v` flag in the command line. 

The Home Folder $HOME

Each user has a project-independent home directory. The storage limit of the home folder is fixed at 10 GiB. We recommend to use it as the folder where you host any scripts you want to keep to yourself. Users are encouraged to share their data elsewhere, see our Data Sharing Suggestions. All data on /home is backed up. In the case of ‘accidental’ deletion of any data inside the home folder, you can retrieve the data by following the example here.

...

The first column in the output shows the permissions set for the folder/file. For more information on unix file permissions, see this page.

To look up how much storage you have access to through which projects, run the command ‘lquota’ on the login node. It prints out the storage allocation info together with its live usage data. For example, the return message 

...

The second section with all the lines that start with ‘#PBS’ specifies how much of each resource the job will need. It requests an allocations of 48 CPU cores, 190 GiB memory, and 200 GiB local disk on a compute node from the normal queue for its exclusive access for 2 hours. It also requests the system to mount both the a00 project folders on the filesystems /scratch and /g/data inside the job, and to enter the working directory once the job is started. Please see more PBS directives explained in here.  Note that a ‘-lstorage’ directive must be included if you need access to /g/data , otherwise these folders will not be accessible when the job is runningis needed.

To find the right queue for your jobs, please browse the Gadi Queue Structure and Gadi Queue Limit pages. 

Info

Users are encouraged to request resources to allow the task(s) to run around the ‘sweet spot’ where the job benefits from parallelism and achieves shorter execution time while utilising at least 80% of the requested compute capacity.

While searching for the sweet spot, please be aware that it is common to see components in a task that run only on a single core and cannot be parallelised. These sequential parts drastically limit the parallel performance. For example, having 1% sequential parts in a certain workload limits the overall CPU utilisation rate of the job when running in parallel on 26 48 cores to less than 80%70%. Moreover, parallelism adds overhead which in general scales up with the increasing core count and, when beyond the ‘sweet spot’, results in a waste of time on unnecessary task coordinations.

...

Info

Users are encouraged to run their tasks if possible in bigger jobs to take advantage of the massive potential massive parallelism that can be achieved on Gadi. However, depending on the application, it may not be possible for the job to run on more than a single core/node. For applications that do run on multiple cores/nodes, the commands and scripts/binaries called in the third section determine whether the particular job can utilise the requested amount of resources or not. Users need to edit the script/input files which define the compute task to allow it, for example, to run on multiple cores/nodes. It may take several iterations to find the ideal details for sections two and three of the submission script when exploring around the job's sweet spot.

...