Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 21 Next »

Last updated  (3:00pm AEST)


This page provides an overview of what users can expect as they make the transition from Raijin to Gadi in 2019 Q4. There is a lot of information here, so please take the time to read this page carefully. NCI will regularly update this page and provide more detailed information here as it becomes available.

If you have questions or special concerns about the transition from Raijin to Gadi please let us know as soon as possible; send an email to NCI user support at help@nci.org.au.

Timeline

Updated  2:00pm AEST

Date(s)

Events
NOWNCI data centre preparation and Gadi installation in progress. Users preparing for Gadi.

 

Gadi and Raijin available to users.
Gadi allocations will match Raijin Q4 allocations.
Jobs can be run (independently) on both systems.

 

Raijin job submission ends.  

 

Raijin nodes go offline.
Sandy Bridge nodes decommissioned.
Broadwell, Skylake and K80 GPU nodes offline for migration to Gadi. The expected downtime for these nodes is three days.
Raijin /short file system available on Gadi login nodes (alternative path) for user transfers.

 

Broadwell, Skylake and K80 GPU nodes available on Gadi. 

 

Gadi resource allocations for 2020 Q1 installed.

 

Gadi operational at full production specification.

 

Raijin /short file system decommissioned.

Please note that this timeline will be updated as often as necessary to reflect installation activities and dependencies.

User Environment

  1. The user's default shell and project will be controlled by the file gadi-login.conf in each user's $HOME/.config directory.
  2. Gadi /home quotas are applied on a per-user basis, as on Raijin.
  3. Gadi /home quotas will be 10 GB.
  4. Gadi users will be able to copy data from their Raijin home directories via an alternative, read-only path.
  5. Gadi login and compute nodes will run the CentOS 8 operating system.

Resources

  1. The basic compute charge rate for Gadi is 2.0 SU per core-hour. This charging reflects Gadi's CPU performance relative to Raijin.
  2. All NCI allocations for 2020, including NCMAS, will be on Gadi only.
  3. Compute allocations on Gadi are managed by stakeholder scheme managers. Check this page (link) to who your scheme manager is. 
  4. Compute allocations on Gadi will apply to projects, as on Raijin.
  5. in 2019 Q4, all active projects will be given Gadi compute quotas which match (pro-rata) their 2019 Q3 or Q4 Raijin allocations. 
  6. During the Raijin-Gadi pre-production period, compute (job) accounting is independent on each system. A project may consume its full allocation on Raijin, and also its full allocation on Gadi, with no penalty.

File Systems - /home

  1. Gadi /home is a new, independent file system.
  2. The quota on Gadi home directories will be 10 GB, as compared to a 2 GB quota on Raijin. Home directories are intended for irreproducible files, e.g. source code and configuration files, and users are expected to utilise /scratch, /g/data and JOBFS file systems for working data.
  3. The contents of Raijin user home directories will not be migrated to Gadi. 
  4. Raijin /home will be available on Gadi via a temporary, read-only path to help users manage valuable home directory files until Gadi is fully operational. Users are strongly encouraged to copy only essential files from Raijin to their home directories on Gadi.

File Systems - /scratch

  1. The temporary file system for Gadi users is /scratch. Note that the path '/short', as used on Raijin, will not exist on Gadi. 
  2. The contents of Raijin /short will not be migrated to Gadi /scratch. 
  3. Raijin /short will be available on Gadi via a temporary, read-only path on login and data mover nodes only until . Users are strongly encouraged to copy only essential files to Gadi /scratch.
  4. Data transfer rates from Raijin /short to Gadi /scratch are expected to be approximately 1 TB per hour. Please plan your transfers accordingly, and do not wait until the last minute.
  5. Gadi /scratch will be subject to an automated file purging policy: files will be removed 90 days after the time of last modification. In the interest of fairness, exceptions to this policy are not permitted.
  6. NCI is developing tools and notifications to help users track the status of their files in /scratch. 
  7. Any attempts to circumvent the 90-day scratch purge policy by using the touch command or other strategies will result in account deactivation.
  8. The Gadi /scratch purging policy is expected to be activated at or near . Users will have ~3 months to clean up and organise files in /scratch before activation of the purging regime.
  9. All projects will be provided with a default /g/data directory for storage of persistent data. The default quota for /g/data project directories remains to be finalised. Note that allocations for projects which already have /g/data access will not change.
  10. Plan to modify your workflow(s) to place temporary files on /scratch, and persistent files on /g/data.

File Systems - /g/data

  1. The /g/data file systems will be available on Gadi, and on Raijin during the Gadi pre-production phase. Infrastructure and deployment work may temporarily impact file system performance until Gadi is running in production mode.
  2. No changes to existing data services and data collections are expected as Gadi is brought into production.

Jobs

  1. Gadi will run PBS Pro version 19.
  2. Gadi queues... (link to queue page)
  3. Gadi resource limits... (link to queue limits page). 
  4. Project job resource exemption (for example, wall time extensions) established on Raijin will not be carried across to Gadi. Exceptions on Gadi will need to be compellingly justified. 
  5. The PBS_JOBFS size on Gadi will be limited to 400 GB per node. Jobs that require more than 400 GB/node are expected to use /scratch.
  6. Jobs on Gadi must explicitly declare, via PBS directives, which file systems are to be accessed during the job. As an example, a job which will read or write data in the /scratch/<project> and /g/data/<project> directories must include the directive  "-lstorage=scratch/<project>+gdata/<project>".  A job that attempts to access a /g/data or scratch directory without this directive will fail.
  7. Job scheduling will be determined at project granularity, as was the case on Raijin. It is not possible to schedule jobs on a per-user basis.
  8. Jobs will be charged according to the resources requested, that is, by number of CPUs or amount of memory requested, whichever is larger. Note that charging on Raijin was based on cpu-hours only, without consideration of memory used. 
  9. As a result of the new charging model, projects with memory-dominant, minimal cpu workflows may consume SUs more rapidly on Gadi than they did on Raijin. 
  10. Broadwell, SkyLake and K80 nodes will be offline for several days (expected: 3 days) as they are migrated to Gadi in late November. 

Processors

RaijinGadi
Intel Xeon E5-2670 (Sandy Bridge)Intel Xeon Platinum 8274 (Cascade Lake)
Two physical processors per node

Two physical processors per node

2.6 GHz clock speed3.2 GHz clock speed
16 cores per node48 cores per node

332 GFLOPs per node
(theoretical peak)

4915 GFLOPs per node
(theoretical peak)

Software

  1. Executables will be binary compatible between Raijin and Gadi in most cases, if required libraries and dependencies are available. To obtain optimum performance, all executables should be rebuilt for Gadi's processor architecture. 
  2. The most recent versions of third-party software packages which are widely used will be built by NCI staff and installed in the Gadi /apps directory. The criteria of 'widely used' is considered to be more or less continuous usage by three or more independent research groups. 
  3. Unfortunately, it is not possible to build and install all older versions of third-party software. NCI may consider cases of older software if there is a compelling demonstration of need and there are no issues with regard to dependencies or processor architecture. 
  4. NCI can assist with local builds of third-party software for individual research groups on Gadi, as on Raijin. Please note that during the transition to Gadi staff time may be limited and software assistance may be deferred until Gadi is fully operational. 
  5. The modules command will work essentially the same on Gadi as on Raijin.
  6. Python 2.7.16 will be provided, however this will be the final version of Python 2 installed on Gadi. Development of Python 2 will officially cease on .  All users are encouraged to move to Python 3 as soon as possible.
  7. Work is in progress on a Raijin backward compatibility job environment for Gadi. This is expected to be available to users in Q4 and 2020 Q1 for a limited time only - details to be confirmed. As always, users are strongly encouraged to rebuild all applications on Gadi for best stability and performance. 
  8. Work is currently in progress porting third-party application software to Gadi is in progress. See Gadi Software Catalogue - DRAFT for more information.

Workflows

  1. Users are advised to adjust workflow timeframes to account for Gadi's faster and more efficient CPUs.
  2. Gadi nodes have 48 CPUs and 192 GB memory. Single node workflows can now use up to 48 CPUs and 192 GB of memory.

Other

  1. Containers will be available on Gadi, however, NCI staff will need to build the container image to ensure it satisfies security and compatibility criteria. Users who wish to use containers should contact NCI user support at help@nci.org.au, who will put you in contact with the NCI HPC group.
  2. NCI's VDI service is independent of Raijin and Gadi. With Raijin, the /apps third-party application tree was copied to the VDI environment. With Gadi, there will initially be no change to the VDI operating environment, however, differences in operating systems and architectures will eventually lead to divergence in the GADI and VDI application software stacks. Users with questions about VDI software are advised to contact NCI user support.
  3. NCI's cloud is independent of Gadi. No changes to NCI cloud operations are expected as we bring Gadi into production.

Questions?

If you have further questions or concerns about the transition from Raijin to Gadi please let us know - contact NCI user support at help@nci.org.au.





  • No labels