Updated: – DRAFT v2
To meet the increasing demand for more space on /scratch from our researchers and to reduce the number of 'forgotten' temporary files, NCI is introducing a new file management policy for the Gadi /scratch file system. This new policy will automatically clean up files older than 100 days and so create more space available to research projects.
This policy change will facilitate greater fairness in the use of temporary scratch storage for all NCI projects.
The removal of old /scratch files is a three stage process.
Step 1 - Files older than 100 days are moved from project directories on /scratch into a quarantine space. Once a file has been moved to quarantine it will no longer be accessible to the project, or to any HPC jobs run by the project or collaborating projects, regardless of file permission settings.
Step 2 - Files remain in quarantine for 14 days. During this quarantine period files maybe recovered by members of the project and restored to active use if needed.
Step 3 - Any files remaining in quarantine at the end of the 14-day quarantine period will be deleted. Deletion from the quarantine space is automatic, and final. Once a file is deleted, it can not be recovered. All users are reminded that the /scratch file system is intended to store working files only. Data that researchers or projects wish to keep for an extended period of time must be copied from the /scratch filesystem to the project's /g/data space, archived to massdata (tape) or downloaded to local storage.
This new /scratch file management procedure is a significant shift in the way the /scratch file system is managed, so it will be progressively introduced in May-June 2022, with full implementation from 1 July 2022. NCI must implement this /scratch file system change before the Q3 (July) downtime as it will support essential tuning and reconfiguration of the /scratch file system in a full production, peak performance capacity.
The introduction of this /scratch file system policy may quarantine a large number of files for any projects which have accrued substantial /scratch usage on Gadi. To make this process more manageable for users the policy will be implemented in stages according to the following schedule:
A new utility, nci-file-expiry, can be used to identify and restore files from quarantine space. The document below contains more information and usage examples. The command option "–help" can also provide usage and syntax information.
Note that the accrual of many files on /scratch for some projects can cause nci-file-expiry to run slowly, or to run out of working memory before the utility prints a report. A project with on the order 10**5 or more files in the "Warning" state may need to run nci-file-expiry via a PBS job script to allow sufficient cpu time and memory to generate an inventory of scratch files. This potential issue is expected to ease as after the initial quarantine-expiry cycles in May-June.
Important points to remember about the new /scratch file management process: