Page tree

I/O Intensive Systems

PBS_JOBFS is useful when jobs have a lot of small I/O tasks within it, you can boost the performance of these sorts of jobs by using this filesystem to your advantage. 

However, the SSD drives that are mounted within nodes are relatively small, usually only 400GiB, so some users may find that they run out of space in this situation. 

To prevent these drives from being overwhelmed, NCI has created a new iointensive system on Gadi.

In terms of your jobs, the new iointensive system operates similarly to PBS_JOBFS, in the way that it is mounted to the node and only exists for the duration that your PBS job is running. However, rather than using a local SSD in each node, it instead uses volumes presented from ultra-high-performance, all-flash NetApp EF600 storage arrays that are attached to Gadi's Infiniband fabric.

Even though this storage is technically provided by the network, it uses the NVME-over-fabrics protocol; coupled with the Infiniband network's ultra-low-latency nature, this connectivity provides even lower-latency access and higher I/O concurrency than what is possible from the SATA-connected node-local SSDs.

Additionally, since these volumes are dynamically managed by PBS, you're not limited to just the single drive in each compute node: if you request more than 1 volume per node, it will transparently combine them together for you into a single filesystem on each node of your job.

The iointensive system does not currently work in the normalsr and expresssr queues (i.e. the Saphhire Rapids nodes) due to hardware incompatibilities. We're currently investigating possible workarounds.

How to Use iointesive

To request this storage, users must use the directive

-liointensivee=<numberofTiB>

The value is the total number of volumes that you would like attached to the job, and since each volume is 1 TiB in size, you can simply enter the number of TiB you want associated with the job. 

This number must be divisible by the amount of nodes that are in the job request, as the volumes must be distributed evenly among them.

 Once your job starts, the storage will be available at /iointensive – just remember that each node has its own dedicate storage (the same as for PBS_JOBFS) and so files written here on one node will not be visible on other nodes.

1 node job requesting 1 volume: -lncpus=48 would require, -liointensive=1 

5 node job requesting 2 volumes (i.e. 2TiB) per node: -lncpus=240 would require, -liointensive=10 

Note that at present, jobs are limited to a maximum of 64 volumes total (i.e. 64 nodes at 1 volume per node, 32 nodes at 2 volumes per node, ...).

Authors: Andrew Wellington, Ben Menadue