Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Contents

Table of Contents

.

Gadi Timeline

Updated  (10:30am AEDT)

...

Please note that this timeline will be updated as often as necessary to reflect progress in data centre preparations, installation activities, and dependencies. NCI must decommission Raijin before Gadi can be configured in its full production capacity.

User Environment

  1. The user's default shell and project will be controlled by the file gadi-login.conf in each user's $HOME/.config directory.
  2. Gadi /home quotas are applied on a per-user basis, as on Raijin.
  3. Gadi /home quotas will be 10 GB.
  4. Gadi login and compute nodes will run the CentOS 8 operating system.
  5. NCI is currently considering the best mechanism to deliver files from Raijin /home directories to Gadi. The solution is likely to be either (a) user can copy files from a read-only archive copy of Raijin /home, or (b) a bulk copy of user Raijin /home files to Gadi /home directories. More information will be available on this as soon as possible.

User Environment: Compilers and MPI

...

NCI plans to provide version OpenMPI 4.0.2 at the time of Gadi pre-production, subject to testing and validation.

Processor Comparison: Raijin vs Gadi

RaijinGadi
Intel Xeon E5-2670 (Sandy Bridge)Intel Xeon Platinum 8274 (Cascade Lake)
Two physical processors per node

Two physical processors per node

2.6 GHz clock speed3.2 GHz clock speed
16 cores per node48 cores per node

332 GFLOPs per node
(theoretical peak)

4915 GFLOPs per node
(theoretical peak).

Resources

The computing charge rate on Gadi is 2.0 service units (SU) per cpu-hour. This rate broadly reflects Gadi's performance relative to Raijin.

...

During the Gadi pre-production period, compute (job) accounting on Raijin and Gadi will be independent.

Logging in

To login from your local desktop or other NCI computer run ssh:

...


where abc123 is your own username. Your ssh connection will be to one of ten possible login nodes. As usual, for security reasons we ask that you do not set up passwordless ssh to Gadi. Entering your password every time you login is more secure, or use specialised ssh secure agents.

File Systems - /home

Gadi /home is a new, independent file system.

...

Users are strongly encouraged to retain only essential files from their Raijin home directories on Gadi. 

File Systems - /scratch

The temporary file system for Gadi users is /scratch. Note that the path '/short', as used on Raijin, will not exist on Gadi. 

...

Plan to modify your workflow(s) to place temporary files on /scratch, and persistent files on /g/data.

File Systems - /g/data

The /g/data file systems will continue to be available on Gadi and Raijin during the Gadi transition phase. Infrastructure work may temporarily impact file system performance during pre-production. Please also note that during transition, while Raijin and Gadi systems are both connected to the /g/data file systems, the file system performance may be impacted, as bandwidth is shared across both systems.

...

Project data on the /g/data2 file system was recently migrated to a new file system, /g/data4. A symbolic link /g/data2→/g/data4 has been provided for backward compatibility on Raijin. This /g/data2 symbolic link will not be provided on Gadi. All Gadi users are expected to update scripts and workflows to include  the new /g/data4 path where needed. 

Jobs

Gadi Cascade Lake nodes have 48 CPUs and 192 GB memory.

...

Broadwell and SkyLake nodes are expected to be offline for three working days in November when they are migrated to Gadi. Users who rely on Broadwell or  SkyLake nodes should prepare for approximately three (3) days of downtime in late November. Unfortunately a testing/pre-production period will not be available to Broadwell and SkyLake workflows.

Job Charging - Examples

Gadi Cascade Lake node = 48 CPUs, 192 GB memory

...

QueueCPUsMemory (GB)WalltimeChargeComments
Normal416 GB5 hours4 x 5 x 2 = 40 SUSatisfies 1 CPU <= 4 GB memory.
Normal816 GB5 hours8 x 5 x 2 = 80 SUCPU request dominates.

Normal8128 GB5 hours32 x 5 x 2 = 320 SUMemory request dominates.
32 cpus is proportion of node resources.

Normal8192 GB5 hours48 x 5 x 2 = 480 SUMemory request dominates.
192GB = 100% of node memory.
Express816 GB5 hours8 x 5 x 2 x 3 = 240 SUCPU request dominates (as above).
Express multiplier is x3.

Software

NCI strongly recommends that all users recompile their applications to obtain optimum performance and compatibility with the Gadi run-time environment. 

...

Work is in progress on a container environment to support Raijin backward compatibility on Gadi. This is intended to be a stop-gap solution for projects which require more time to adapt to Gadi. This "Raijin in a container" is expected to be available to users in Q4 and 2020 Q1 for a limited time only - details to be confirmed. Users are again strongly encouraged to rebuild all applications on Gadi for long-term stability and performance. 

Virtual Desktop

...

Infrastructure (VDI)

NCI's VDI service will continue to be available to users as Gadi enters service in 2019 Q4 and 2020 Q1. Overall VDI functionality is expected to remain unchanged.

...

As is the case now on Raijin, user home directories on VDI will continue to be separate from home directories on Gadi.

Development of VDI-to-Gadi job submission functionality is a high priority. NCI will provide advice on how to use this VDI feature following pre-release testingnow available. Please note that this will be implemented as the default option overnight Monday 9 December. For more information about how to use this feature during the transition period see the VDI User Guide https://opus.nci.org.au/display/Help/VDI+User+Guide#VDIUserGuide-4.2.PBS.

Data Collections

Gadi users who require access to NCI data collections should ensure they are members of the required data collection projects. 

...

More comprehensive updates on VDI will be provided to users in January-February 2020. If you have specific questions or concerns about VDI please contact NCI User Support - help@nci.org.au.

Training

Transition to Gadi - as presented at the ALCS 2019 Training Day.

Questions?

If you have further questions or concerns about the transition from Raijin to Gadi please contact NCI user support at help@nci.org.au.

...