Page tree

Data Centre

19  Sep 2024 10:10am

Systems Impacted: Compute nodes in normal, express, normalsr, expresssr, and gpuvolta queues

Dear NCI Users,

A cooling fault in the NCI Data Centre has lead to the nodes backing the normal, express, normalsr, expresssr, and gpuvolta queues being powered off to protect the hardware. Cooling has now been restored and NCI staff are bringing these nodes back online.

Any jobs that were running on these nodes will have been lost.


Regards,
NCI User Services

/g/data

24  Apr 2024

Systems Impacted: gdata1a

Filesystem Fault

Dear NCI Users,

NCI system admins have reported problems with gdata1a filesystem. If your session to Gadi is hanging it is possible your projects are on that filesystem. System admins are working on finding and fixing the issue. This page will be updated as soon as there is more information.

Update: Filesystem resumed normal operations since 1:19pm. 


Regards,
NCI User Services


/g/data

25  Mar 2024

Systems Impacted: gdata6

Filesystem Fault

Dear NCI Users,

NCI system admins have reported Lustre problems with gdata6 filesystem. If your session to Gadi is hanging it is possible your projects are on that filesystem. System admins are working on finding and fixing the issue. This page will be updated as soon as there is more information.

Update: Filesystem resumed normal operations since 3:01pm


Regards,
NCI User Services


/g/data

7  Mar 2024

Systems Impacted: gdata1a

Filesystem Fault

Dear NCI Users,

NCI system admins have reported a problem with one of the storage servers in gdata1a. If your session to Gadi is hanging it is possible your projects are on that filesystem. Currently this is being investigated. This page will be updated as soon as there is more information.

Update: Filesystem has resumed normal operations and has been stable since 12:30pm


Regards,
NCI User Services


Core cloud infrastructure

4 Jan 2024 9:00am

Systems Impacted: ARE, Nirin VM, accessdev and other services relying on cloud infrastructure

Hardware Faults

Dear NCI Users,

We had hardware issues on the core cloud infrastructure which causes the impacted systems unresponsive.

We have identified a fix for this issue and we are implementing it now.

If you require further assistance, please contact NCI User Services via the Helpdesk at https://help.nci.org.au or help@nci.org.au.

Update 4 Jan 12:48pm: All the impacted compute nodes are back up. Users will need to verify that their services are properly functional or not. Any users with instances that went down can restart them now.

Regards,
NCI User Services