Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Last updated 08 updated   (32:00pm 30pm AEDT)


Info

UPDATE 08 ANU re-opens main campusGadi is expected to be available to users from 9:00am AEDT tomorrow, Thursday Gadi now operational

The ANU has reopened its main Acton ACT campus today following a multi-day closure due to hazardous smoke conditions. Smoke, intermittent power outages, and fire risk continue to be concerns in the ACT and region; hazard warnings are currently in effect. If conditions deteriorate, the ANU may opt to stand down staff and re-close the Acton campus, which could affect Gadi operations. NCI will update its online information and issue all-user communications if there are further campus closures or service interruptions. 

...

Gadi Status Summary

UPDATED 08

  • Gadi will begin production operations on is operational as of Thursday 
  • Raijin /short and /home directories will be available on Gadi via the paths /raijin/short and /raijin/home, respectively, until . These paths can be accessed read-only on login nodes and copyq/datamover nodes only.
  • On Gadi use the PBS directive "-lstorage=<path>" if your job accesses a /g/data directory or the /scratch directory of another project. (Note that POSIX permissions still apply.) Failure to provide these directives will cause a job to fail with a run-time error. See the section below Filesystems-/g/data for more information.
  • On Gadi, user workflows should reference /g/data directory paths using the form "/g/data/projectcode", i.e. without the alphanumeric filesystem descriptors 1a, 1b, 2, 3, or 4. 
  • A Raijin run-time compatibility image is provided on Gadi. To use this add the "-limage=raijin" directive in your PBS job script, and modify your cpu request to be Gadi compliant, i.e. a multiple of 48 cpus, if you are using more than one full node. Use of the Raijin compatibility image will incur a performance penalty. All users are advised to recompile their applications on Gadi as soon as possible.

...

Date(s)

Events

 

NCI data centre preparation and Gadi installation phase one - COMPLETED.

 

Gadi stability and acceptance testing underway. Users preparing for Gadi.

 

Raijin user home directories will be copied to Gadi home directories ($HOME/raijin_home). 

 

Transition Phase One
Gadi and Raijin available to users. Gadi pre-production configuration is expected to include one rack of V100 GPU nodes.
Gadi allocations will match Raijin Q4 pro-rata allocations.
Jobs can be run (independently) on both systems.

 -  

Raijin /short available read-only on Gadi login and data mover nodes for user file transfers.
Progressive deployment of Gadi nodes to full specification, and phased retirement of Raijin Sandy Bridge nodes.

 

50% of Raijin Sandy Bridge nodes decommissioned to allow power work for Gadi - DONE

 

Raijin Broadwell nodes offline for power reconfiguration work - DONE

 

Raijin operational with Broadwell and Skylake nodes only - DONE
All Raijin Sandy Bridge nodes decommissioned; "normal" and "express" queues no longer available - DONE

 

Raijin run-time compatibility environment available on Gadi - DONE

 -  

Scheduled Downtime - Gadi
Scheduled downtime for final Gadi configuration and pre-production acceptance testing. 

 -  

Raijin operational with Broadwell and Skylake compute nodes. 

 

Production Phase Two
Gadi operational in production configuration.
Broadwell and Skylake nodes offline for Gadi integration.
UPDATE: The ANU has re-opened its main campus following a multi-day shutdown due to hazardous smoke conditions in the ACT. Gadi is expected to be available to users Thursday .

 

Raijin /short filesystem decommissioned. 
Jan 2020 - TBDBroadwell and Skylake nodes migrated to Gadi.

 -  

2020 Q2 Scheduled Maintenance Downtime - Gadi
Q2 scheduled quarterly maintenance downtime will be extended to accommodate configuration tuning for Gadi. Details will be provided at a later date.

...