|Table of Contents|
Updated (10:30am AEDT)
Please note that this timeline will be updated as often as necessary to reflect progress in data centre preparations, installation activities, and dependencies. NCI must decommission Raijin before Gadi can be configured in its full production capacity.
- The user's default shell and project will be controlled by the file gadi-login.conf in each user's $HOME/.config directory.
- Gadi /home quotas are applied on a per-user basis, as on Raijin.
- Gadi /home quotas will be 10 GB.
- Gadi login and compute nodes will run the CentOS 8 operating system.
- NCI is currently considering the best mechanism to deliver files from Raijin /home directories to Gadi. The solution is likely to be either (a) user can copy files from a read-only archive copy of Raijin /home, or (b) a bulk copy of user Raijin /home files to Gadi /home directories. More information will be available on this as soon as possible.
User Environment: Compilers and MPI
NCI plans to provide version OpenMPI 4.0.2 at the time of Gadi pre-production, subject to testing and validation.
Processor Comparison: Raijin vs Gadi
|Intel Xeon E5-2670 (Sandy Bridge)||Intel Xeon Platinum 8274 (Cascade Lake)|
|Two physical processors per node|
Two physical processors per node
|2.6 GHz clock speed||3.2 GHz clock speed|
|16 cores per node||48 cores per node|
332 GFLOPs per node
|4915 GFLOPs per node|
The computing charge rate on Gadi is 2.0 service units (SU) per cpu-hour. This rate broadly reflects Gadi's performance relative to Raijin.
During the Gadi pre-production period, compute (job) accounting on Raijin and Gadi will be independent.
To login from your local desktop or other NCI computer run ssh:
where abc123 is your own username. Your ssh connection will be to one of ten possible login nodes. As usual, for security reasons we ask that you do not set up passwordless ssh to Gadi. Entering your password every time you login is more secure, or use specialised ssh secure agents.
File Systems - /home
Gadi /home is a new, independent file system.
Users are strongly encouraged to retain only essential files from their Raijin home directories on Gadi.
File Systems - /scratch
The temporary file system for Gadi users is /scratch. Note that the path '/short', as used on Raijin, will not exist on Gadi.
Plan to modify your workflow(s) to place temporary files on /scratch, and persistent files on /g/data.
File Systems - /g/data
The /g/data file systems will continue to be available on Gadi and Raijin during the Gadi transition phase. Infrastructure work may temporarily impact file system performance during pre-production. Please also note that during transition, while Raijin and Gadi systems are both connected to the /g/data file systems, the file system performance may be impacted, as bandwidth is shared across both systems.
Project data on the /g/data2 file system was recently migrated to a new file system, /g/data4. A symbolic link /g/data2→/g/data4 has been provided for backward compatibility on Raijin. This /g/data2 symbolic link will not be provided on Gadi. All Gadi users are expected to update scripts and workflows to include the new /g/data4 path where needed.
Gadi Cascade Lake nodes have 48 CPUs and 192 GB memory.
Broadwell and SkyLake nodes are expected to be offline for three working days in November when they are migrated to Gadi. Users who rely on Broadwell or SkyLake nodes should prepare for approximately three (3) days of downtime in late November. Unfortunately a testing/pre-production period will not be available to Broadwell and SkyLake workflows.
Job Charging - Examples
Gadi Cascade Lake node = 48 CPUs, 192 GB memory
|Normal||4||16 GB||5 hours||4 x 5 x 2 = 40 SU||Satisfies 1 CPU <= 4 GB memory.|
|Normal||8||16 GB||5 hours||8 x 5 x 2 = 80 SU||CPU request dominates.|
|Normal||8||128 GB||5 hours||32 x 5 x 2 = 320 SU||Memory request dominates.|
32 cpus is proportion of node resources.
|Normal||8||192 GB||5 hours||48 x 5 x 2 = 480 SU||Memory request dominates.|
192GB = 100% of node memory.
|Express||8||16 GB||5 hours||8 x 5 x 2 x 3 = 240 SU||CPU request dominates (as above).|
Express multiplier is x3.
NCI strongly recommends that all users recompile their applications to obtain optimum performance and compatibility with the Gadi run-time environment.
Work is in progress on a container environment to support Raijin backward compatibility on Gadi. This is intended to be a stop-gap solution for projects which require more time to adapt to Gadi. This "Raijin in a container" is expected to be available to users in Q4 and 2020 Q1 for a limited time only - details to be confirmed. Users are again strongly encouraged to rebuild all applications on Gadi for long-term stability and performance.
NCI's VDI service will continue to be available to users as Gadi enters service in 2019 Q4 and 2020 Q1. Overall VDI functionality is expected to remain unchanged.
As is the case now on Raijin, user home directories on VDI will continue to be separate from home directories on Gadi.
Development of VDI-to-Gadi job submission functionality is a high priority. NCI will provide advice on how to use this VDI feature following pre-release testingnow available. Please note that this will be implemented as the default option overnight Monday 9 December. For more information about how to use this feature during the transition period see the VDI User Guide https://opus.nci.org.au/display/Help/VDI+User+Guide#VDIUserGuide-4.2.PBS.
Gadi users who require access to NCI data collections should ensure they are members of the required data collection projects.
If you have further questions or concerns about the transition from Raijin to Gadi please contact NCI user support at email@example.com.