Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Panel
borderColor#21618C
bgColor#F6F7F7
titleColor#17202A
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleGadi Resources

Within Gadi there are 10 login nodes, 6 data-mover nodes, over 4000 compute nodes, and NCI's massdata tape storage.

Below is a deeper look into those that expands on what they are, what they do, and what they connect to. 

Panel
borderColor#21618C
bgColor#F6F7F7
titleColor#17202A
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleOn this page
Panel
borderColor#21618C
bgColor#F6F7F7
borderWidth2
borderStyleridge
Expand
titleLogin nodes

The 10 login nodes are where you land when logging into Gadi. These act as remote hosts to interact with Gadi.

Which login node you are placed into is decided by a round robin process that will allocate you a new node with each log in. 

You can use login nodes to test small amounts of your code, to gather data on your jobs efficiency. However, no full jobs or anything with high compute demands should be run on a login node. These nodes are a shared resource and running jobs here will impact other users. 

Any job running for longer than 30 minutes, or that exceeds 4 GiB will be terminated. 

Expand
titleData-mover nodes
Data-mover nodes are used for exactly that, moving data. You can use these nodes to transfer files to and from Gadi at high-speed, following the steps outlined here.
Expand
titleCompute nodes

Compute nodes are the workhorse of Gadi. You can think of them as thousands of high powered PC's all designed to work together.

You can find a great break down of their specifications and types here.

If you look at the chart to the right, you will notice that the compute nodes don't have access to external internet. If any tasks within a submitted job need access to the internet at any stage, they should be packed into a seperate job on a copyq node with internet access. We will go over how to do this in the job submission guide.

Expand
title/apps
/apps is where all of Gadi's software applications are stored. If you would like to see what software is available to your project, you can use the command '$ module avail' to see a list of them. Please take a look at the software applications guide for more information about them. 
Expand
title/home

/home is your independent directory. It has a 10Gib storage limit that cannot be expanded. /home is a great place to host any scripts that you want to keep to yourself. 

All data on /home is backed up and accidentally deleted files can be retrieved via $HOME/.snapshot. Please see the data storage FAQ for more information.

Expand
title/scratch

/scratch is your playground, where all of your high-speed computing takes place. There are limits to /scratch, including a limit on the number of files and files that aren't accessed in a long time. Please see the table below for all resource capacities.

Expand
titlemassdata
massdata is NCI's tape storage system. Not every project will have access to massdata, only those with a storage allocation. 
Expand
titleg/data
g/data or global data is a storage area intended for long term storage of research data. As you can see in the diagram, g/data is available off of the Gadi system, meaning that users from AARNet can access this system without needing to use Gadi. However, /scratch and the rest of the directories will not be available to them. 
Expand
titleAARNEt and other services
Some projects are run outside of the NCI system, meaning that they don't need to be allocated compute time. As you can see on the right, these projects still have access to data services, VDI/cloud, NFS servers, and /g/data.

Click on the headings on the left hand side to expand them and learn about Gadi's resources.


Anchor
Navigation
Navigation

Navigating through Gadi


Navigating through the directories on login nodes is simple if you can remember the rules outlined below.

 If you keep these formats in mind as you are using Gadi, you will have no problems navigating to the directories that you need.

Your home directory will always be located at home/institution/username

Scratch will always be at scratch/project/username

g/data follows the same format as scratch with g/data/project/username

All software applications are found at apps/software/version

Anchor
Limits
Limits

Resource capacity and limitations




Panel
borderColor#21618C
bgColor#F6F7F7
borderWidth1
titleBGColor#FFB96A
borderStyleridge
title$HOME

Owner> User

Accessible from> PBS jobs and login nodes

Size limit> 10 GiB with zero extensions available

Allocation valid until> Users account is deactivated 

Resource specific attributes:


  • Backups located in $HOME/.snapshot
Panel
borderColor#21618C
bgColor#F6F7F7
borderWidth1
titleBGColor#FFB96A
borderStyleridge
title/scratch

Owner> Project

Accessible from> PBS Jobs† and login nodes

Size limit> 1 TiB by default with more available on request 

Allocation valid until>Project completion or job demand changes

Resource specific attributes:


  • Designed for jobs with large data IO
  • No backups
  • Files not accessed for 100 says will be moved from /scratch and placed into a quarantined location
  • any files quarantined for longed than 14 days will be automatically deleted
  • number of files limit applies to /scratch 

Need to be explicitly mounted using the PBS directive

`-lstorage`. Please see our PBS directives listing for more information. 

Panel
borderColor#21618C
bgColor#F6F7F7
borderWidth1
titleBGColor#FFB96A
borderStyleridge
title/g/data

Owner> Project

Accessible from> PBS Jobs† and login nodes

Size limit> Amount is set by the scheme manager

Allocation valid until> Project completion 

Resource specific attributes:


  • Designed to store persistent data
  • No backups
  • Number of files limit applies
  • g/data is also accessible from other NCI services e.g. Nirin cloud

† Need to be explicitly mounted using the PBS directive `-lstorage`. Please see the jobs submission page for more information. 

Panel
borderColor#21618C
bgColor#F6F7F7
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titlemassdata

Owner> Project

Accessible from> PBS copyq Jobs and login nodes

Size limit> Amount set by scheme manager

Allocation valid until> Project completion

Resource specific attributes:


  • Backup files are stored in two different buildings
  • tape-based archival data storage

Read more about massdata here.

Panel
borderColor#21618C
bgColor#F6F7F7
borderWidth1
titleBGColor#FFB96A
borderStyleridge
title$PBS_JOBFS

Owner> User

Accessible from> PBS Jobs*

Size limit>  SSD Disk space available on the job's hosting node(s) Default 100MB

Allocation valid until> Job termination 

Resource specific attributes:


  • No backups
  • Designed for jobs with frequent and small IO

* Job owner can access the folder through commands like `qls` and `qcp` on the login node during the job.

Read more about $PBS_JOBFS here.

Panel
borderColor#21618C
bgColor#F6F7F7
titleColor#17202A
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleI/O Intensive

 Owner> User

Accessible from> PBS Jobs

Size limit>  All-flash NetApp EF600 storage, volumes available on request

Allocation valid until> Job termination 

Resource specific attributes:


  • No backups
  • Designed for jobs with frequent and small IO
  • Does not currently work in normalsr and expresssr queues

Please refer to out I/O Intensive page for more information about this system.  

Panel
borderColor#21618C
bgColor#F6F7F7
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleSoftware applications

Owner> NCI

Accessible from> PBS jobs and login nodes

Size limit> N.A

Allocation valid until> N.A

Resource specific attributes:


  • Built from source on Gadi where possible
  • More applications can be added according to demand, dependencies and scalability. Applications can be requested to be added to the Gadi /apps repository.
Panel
borderColor#21618C
bgColor#F6F7F7
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleLicences

Owner> Software group owner

Accessible from> PBS jobs and login nodes

Size limit> Available seats on the licencing server 

Allocation valid until> Licence expiry date

Resource specific attributes:


  • Access controlled by software group owner. Module file and PBS `-lsoftware` directive are used when controlling access to licence
  • NCI owned licences are for academic use only
  • Projects, institutions and universities can bring their own licences 
  • See our live licence status page for more information
Tip

There is also a quota called iQuota that is applied to /scratch and /g/data. This limits the maximum number of files and folders allowed within a project. you can see the amount of iQuota by running the command

Code Block
themeFadeToGrey
$ lquota

Please try to keep the number of files as low as possible as this can affect the I/O performance in your job. Gadi is efficient at handling large scale parallel I/O but performance becomes significantly worse when doing frequent small small operations.

A main culprit for creating a large amount of files is the Python packaging system conda. Please use pip and the available modules that are already tuned for Gadi to keep file and folder count as low as possible. 

Anchor
jobfs
jobfs
Panel
borderColor#21618C
bgColor#F6F7F7
titleColor#17202A
borderWidth1
titleBGColor#FFB96A
borderStyleridge
title$PBS_JOBFSJobfs

Any job submitted to Gadi is allocated a default 100 MB of storage space on the hosting nodes SSD. NCI encourages users to utilise the folder $PBS_JOBFS in jobs that generate a large number amount of small I/O operations. This will boost your jobs performance by saving the amount of time that would spent running those small operations and frees up space for your project in /scratch and /g/data.

You can also request space on multiple compute nodes by adding the directive -l jobfs to your job script, for example,

Code Block
themeFadeToGrey
#PBS -l jobfs=100GB

Would request 100 GiB on the nodes. If this job was to run on multiple nodes, this 100 GiB would be equally distributed among all of them. Jobs that request more disk space than is available on the nodes will fail. Please check the queue structure and queue limits pages for information on how much local disk is available. 

The limit on $PBS_JOBFS is 400 GiB.

Warning
Note that the folder $PBS_JOBFS is physically deleted upon job completion/failure, therefore, it is crucial for users to copy the data in $PBS_JOBFS back to the local directory on the shared filesystem while the job is still running. 
Anchor
massdata
massdata
Panel
borderColor#21618C
bgColor#F6F7F7
titleColor#17202A
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleTape Filesystem - massdata

NCI operates a tape filesystem called massdata to provide a reliable archive for projects to backup their data. The data is held on magnetic tape, which is held in separate machine rooms in two seperate buildings. The tapes are accessed and transported by a small robot that works tirelessly for NCI.

Tip
Here is a video of the robot hard at work.

While projects do have their own path on massdata, i.e. massdata/<projectcode> there is no direct access to it via Gadi. Data requests from the tape library must be launched from within the login nodes or via a copyq job. You can read our job submission page to learn how to submit copyq jobs. 

NCI provides the `mdss` utility for users to manage the migration and retrieval of files between multiple levels of a storage hierarchy: from on-line disk cache to offline tape archival. It connects to massdata and launches the corresponding requests. For example, `mdss get` first launches the requests to stage the remote files from the massdata repository into the disk cache, once the data gets online it then transfers the data back to your local directory, for example, a project folder on /scratch or /g/data. 

Panel
borderColor#21618C
bgColor#F6F7F7
borderWidth1
borderStyleridge

To the right are some simple commands that can help while navigating massdata. These commands can be run from the login nodes and begin with the prefix 'mdss', for example

Code Block
borderColor#21618C
bgColor#ffffff
borderWidth1
themeFadeToGrey
borderStyleridge
$ mdss get

 You can read the manual for mdss by running the command

Code Block
borderColor#21618C
bgColor#ffffff
borderWidth1
themeFadeToGrey
borderStyleridge
$ man mdss

Which will provide you several more ways to interact with the storage library. 

putcopy files to the MDSS
get copy files from the MDSS
mkdir/rmdir

create/delete directories in your massdata directory. 

ls list directories


Authors: Yue Sun, Andrew Wellington, Andrey Bliznyuk, Ben Menadue, Mohsin Ali, Andrew Johnston