The /scratch file system is now regularly scanned for files not accessed for last 100 days. Such files are sent into quarantine. If the files are not recovered within 14 days from the day they landed in quarantine, they will be permanently deleted. More details are available at: Gadi /scratch File Management
NCI has provided a utility, nci-file-expiry
that:
Note that quarantined files still count towards a project's quota, so lquota
and nci_account
commands will include them, but nci-files-report
and du -sh
commands will not.
There are several options available with nci-file-expiry
:
$ nci-file-expiry -h usage: nci-file-expiry [-h] {list-filesystems,list-quarantined,list-warnings,recover,batch-recover,status} ... optional arguments: -h, --help show this help message and exit actions: {list-filesystems,list-quarantined,list-warnings,recover,batch-recover,status} list-filesystems list known filesystems list-quarantined list quarantined objects that are still recoverable list-warnings list objects that will soon expire recover request recovery of recently quarantined object batch-recover request recovery of multiple quarantined objects status show status of recovery requests
For each of these options, you can get further help. For example, if you may want to recover a quarantined file:
$ nci-file-expiry recover --help usage: nci-file-expiry recover [-h] UUID PATH positional arguments: UUID UUID of quarantine record to recover PATH location to store recovered object at (not the directory) ...
That means, you need to supply the UUID of the file and complete path where the file should be restored. If the target directory does not exist, you may have to create that manually and also need to specify the filename it should be saved as. Example:
$ nci-file-expiry recover ca7547b7-2066-4c24-9feb-864b23c038de /scratch/xxy/ab1234/dir1/dirTwo/geometry.py
If your target path includes special chars (like spaces, round brackets, etc.), please add escape characters in the path or include the entire path in double quotes.
In case there are lots of files to be retrieved, consider using batch-recover
option.
This, however, involves a few steps to setup:
Get the full list of files in quarantine and redirect it to a file:
$ nci-file-expiry list-quarantined > list_of_files.txt
Use combination of Unix utilities like awk,
grep
, etc just to get the output file to contain UUID target_path_with_filename
.
Example: awk '{print $1,$6}' < list_of_files.txt > final_list_of_files.txt
Final entries in this file should be something like (example):
395087c4-0555-4521-8fe4-6899796aef93 /scratch/xxy/ab1234/abc/xyz/FDS6.7.7.tar.gz 397465c4-0532-4957-8af4-6899678eaf39 /scratch/xxy/ab1234/dec/a3f/random.txt ...
Note that if the target directory does not exist, you may have to create that manually. nci-file-expiry
will not create the path for you. Also for batch-recover option, target paths with special chars need not be fixed.
You may also consider using the --json
flag to list-quarantined action, if you would like to filter using Python etc.
When this file is ready, just submit it for processing:
$ nci-file-expiry batch-recover final_list_of_files.txt
It will scan the file for errors and let you know if there are any (this will be mostly target path related).
Depending on the number of files in the request, it may take sometime to recover all of them. You can keep checking the status of the recovery by running:
$ nci-file-expiry status
every once in a while.