Page tree

Gadi is Australia’s most powerful supercomputer, a highly parallel cluster comprising more than 180,000 processor cores on ten different types of compute nodes. Gadi accommodates various tasks, from running climate models to genome sequencing, from designing molecules to astrophysical modelling. 

Introduction to Gadi is designed for new users, or users that want a refresher on the basics of Gadi.

If you have any questions regarding this training, please contact training.nci@anu.edu.au.


Introduction to Gadi is now available online on the NCI Teachable website.


Date/Time

The 4th Friday every month, from Apr - Nov, 2024
11 am -  1pm

Register Now


Prerequisites

The only prerequisite for this course is that you have an active NCI user account ready for login. If you do not have an NCI user account you may register for this course, however, you will not be able to take full advantage of any hands-on exercises.

Attendees are strongly encouraged to review the following pages, which contain essential background information, before the course.

Objectives

The training is aiming to empower attendees to work with confidence on Gadi with a basic understanding of 

  • resource accounting

  • the difference among login, compute and data-mover nodes

  • job submission and management

  • module environment, for using software applications

  • essential skills to plan, track and manage their jobs on Gadi.


Learning outcomes

At the completion of this training, you will be able to 

  • login to Gadi
  • transfer data on and off Gadi
  • run module commands to customise user environment and configure software applications
  • submit jobs
  • check and maintain compute, storage, and job status
  • estimate job cost
  • request resources adequate for your jobs
  • monitor job status/progress and its resource utilisation 
  • understand common reasons why jobs finish with errors
  • ask questions about jobs like a pro
Topics covered
  • Login nodes and login environment
  • Shared filesystems and jobfs
  • Home, lustre, and tape filesystems
  • Home and project folders 
  • Data transfer and data mover nodes
  • Compute grant, resource hours and PBS queues
  • Job submission and output/error logs
  • Applications, modules and software groups
  • Login, copyq, and different compute nodes
  • Job cost and resource hours
  • PBS directives
  • Tools for job monitoring before, during and after the run
  • Common reasons why jobs are not running
  • Common reasons why jobs fail


Course Contents

The course content is available as PDF below. Last updated November 2023.

  • No labels