Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

This page provides an overview of what users can expect as they make the transition to Gadi in 2019 Q4. There is a lot of information here, so please take the time to read this page carefully. NCI will update this page and provide more detailed information here as it becomes available.

Timeline

Updated  4:00pm AEST

Date(s)

Events
IN PROGRESSNCI data centre infrastructure enhancements and Gadi installation. 

 

Raijin and Gadi transition period begins. Jobs can be run on both systems.

 

Raijin job submission disabled.

 

Raijin nodes go offline for decommissioning.
Raijin /short file system available on Gadi login nodes.

 

Broadwell, Skylake and P100 and K80 GPU nodes from Raijin added to Gadi operational configuration.

 

Gadi resource allocations for 2020 Q1 installed.

 

Gadi begins operation at full production specification.

 

Raijin /short file system decommissioned.

Please note that this timeline will be updated as often as necessary to reflect installation activities and dependencies.

User Environment

  1. The mechanism for changing a user's default shell and project will change (how so??). This is now controlled by the file Gadi-login.conf in each user's $HOME/.config directory.
  2. Gadi /home quotas are applied on a per-user basis, as on Raijin.
  3. Gadi login and compute nodes will run the CentOS 8 operating system.
  4. N login nodes... Same round-robin login service?

Resources

  1. The basic compute charge rate for Gadi is 2.0 SU per core-hour. This charging reflects Gadi's CPU performance relative to Raijin.
  2. All NCI allocations for 2020, including NCMAS, will be on Gadi only.
  3. Compute allocations on Gadi are managed by stakeholder scheme managers. See this page (link) to get contact details for your scheme manager. 
  4. Compute allocations on Gadi apply to projects, as on Raijin.
  5. in 2019 Q4, all active projects will be given Gadi compute quotas which match (pro-rata) their 2019 Q3 or Q4 Raijin allocations. 
  6. During the Raijin-Gadi transition period, compute (job) accounting is independent on each system. A project may consume its full allocation on Raijin, and its full allocation on Gadi, with no penalty.

File Systems - /home

  1. Gadi /home is a new, independent file system.
  2. The quota on Gadi home directories will be 10 GB, as compared to a 2 GB quota on Raijin. Home directories are intended for irreproducible files, e.g. source code and configuration files, and users are expected to utilise /scratch, /g/data and JOBFS file systems for working data.
  3. The contents of Raijin user home directories will not be migrated to Gadi. 
  4. Raijin /home will be available on Gadi via a temporary, read-only path to help users manage valuable home directory files until Gadi is fully operational. Users are strongly encouraged to copy only essential files from Raijin to their home directories on Gadi.

File Systems - /scratch

  1. The temporary file system for Gadi users is /scratch. Note that the path '/short', as used on Raijin, will not exist on Gadi. 
  2. The contents of Raijin project /short directories will not be migrated to Gadi /scratch. 
  3. Raijin /short will be available on Gadi via a temporary, read-only path on login and data mover nodes only until . Users are strongly encouraged to copy only essential files to Gadi /scratch.
  4. Data transfer from Raijin /short to Gadi /scratch is expected to 
  5. Gadi /scratch will be subject to an automated file purging policy: files will be removed 90 days after the time of last modification. In the interest of fairness, exceptions to this policy are not permitted. Any attempts to circumvent the 90-day purge policy by using the touch command or other strategies will result in account deactivation.
  6. All projects will be provided with a default /g/data directory for storage of persistent data. The default quota for /g/data project directories remains to be finalised. Note that allocations for projects which already have /g/data access will not change.
  7. Plan to modify your workflow(s) to place temporary files on /scratch, and persistent files on /g/data.

File Systems - /g/data

  1. The /g/data file systems will be available on Gadi, as 

Jobs

  1. PBS Pro version 1X...
  2. Gadi queues...
  3. Gadi queue limits...
  4. Resource exemptions (c.f. nf_limits) established on Raijin will not be carried across to Gadi. Any job resource exemptions on Gadi will need to be compellingly justified. 
  5. The PBS_JOBFS size on Gadi will be limited to 400 GB per node. Jobs that require more than 400 GB/node are expected to use /scratch.
  6. Jobs on Gadi must explicitly declare (via PBS '-lother=...' directives) which file systems are to be accessed. As an example, a job which will read or write data in the /g/data1a/project directory must include the directive '-lother=gdata1a'; the job will fail without the directive.
  7. Job scheduling will be determined at project granularity, as was the case on Raijin. It is not possible to schedule jobs on a per-user basis.
  8. Jobs will be charged according to the resources requested, that is, by number of CPUs or amount of memory requested, whichever is larger. Note that use of memory was not explicitly charged on Raijin. 

Software

  1. Executables will be binary compatible between Raijin and Gadi. To obtain optimum performance all executables should be rebuilt for Gadi's processor architecture. 
  2. The most recent versions of third-party software packages which are widely used will be built by NCI staff and installed in the Gadi /apps directory. The criteria of 'widely used' is considered to be more or less continuous usage by three or more independent research groups. 
  3. Unfortunately, it is not possible to build and install older versions of third-party software. NCI may consider cases of older software if there is a compelling demonstration of need and there are no issues with regard to dependencies or processor architecture. 
  4. Software list...
  5. NCI can assist with local builds of third-party software for individual research groups on Gadi, as on Raijin. Please note that during the transition to Gadi staff time may be limited and software assistance may be deferred until Gadi is fully operational. 
  6. The modules command will work essentially the same on Gadi as on Raijin.
  7. Python 2.7.16 will be provided, however this will be the final version of Python 2 installed on Gadi. All users are encouraged to move to Python 3 as soon as possible.

Workflows

  1. Adjust your timeframes to account for Gadi's faster and more efficient CPUs.
  2. Gadi nodes have 48 CPUs and 192 GB memory. Single node workflows can now use up to 48 CPUs and 192 GB of memory.

Other

  1. Containers will be available on Gadi, however, for security and compatibility reasons NCI staff will need to build the container image. Which containers...?
  2. NCI's VDI service is independent of Raijin and Gadi. With Raijin, the /apps third-party application tree was copied to the VDI environment. With Gadi, there will initially be no change to the VDI operating environment, however, differences in operating systems and architectures will eventually lead to divergence in the GADI and VDI application software stacks. Users with questions about VDI software are advised to contact NCI user support.
  3. NCI's cloud is independent of Gadi. No changes to NCI cloud operations are expected as we bring Gadi into production.

Questions?

If you have further questions or other specialised concerns about the transition from Raijin to Gadi let us know as soon as possible. Contact NCI user support - mailto:help@nci.org.au.





  • No labels