NCI's massdata filesystem has two large tape silos in separate machine rooms in two separate buildings enabling automatic offsite backup for all data.
The Mass Data Storage System (MDSS) provides services which automatically manage the migration and retrieval of files between multiple levels of a storage hierarchy: from on-line disk cache to offline tape archival. Archive files or complete filesystems can be moved or duplicated on multiple MDSS servers, or off-site servers, for additional data protection.
The status of your files can be shown using the “dmls -l” command as an alternative to the normal Linux “ls” command. The status includes offline (OFL), regular (REG) and dual state (DUL):
offline (OFL): the file is on tape.
regular (REG): the file is online (on disk).
dual state (DUL): the file is both on tape and disk.
Projects dealing with massive amounts of data need to carefully consider all aspects of data acquisition, storage, retrieval, navigation, and interpretation.
Projects are advised to carefully consider the data flow requirements of their data retrieval and navigation since this facet of MDSS will be the most utilized in the long term.
Following are some issues to be considered prior to writing the first byte of data to DMF:
- data organisation;
- directory structure for efficient recall;
- file layout for efficient recall;
- data navigation (aka metadata) database for archived directories and files;
- projected access/retrieval modes;
- user and application interface to data retrieval process.
- massdata is intended to be used for archiving large data files particularly those created or used by batch jobs. (It is a misuse of the system to try to store large numbers of small files – please do NOT do this. See the netcp -t command option below.)
- Each project has a directory on MDSS with pathname/massdata/projectid on that system. This path CANNOT be directly accessed from Raijin login. Remote access to your massdata directory is by the MDSS utility or the netcp and netmv commands (see man mdss/netcp/netmv for full details.) The MDSS commands operate on files in that remote directory.
- Users connected to the project have rwx permissions in that directory and so may create their own files in those areas.
- NOT to be used as an extension of home directories (files changed/removed on the massdata area are not in general recoverable, as there are no back-ups of previous revisions.)
Currently batch jobs (other than copyq jobs) cannot use the MDSS utilities.
Note: always use -l other=mdss when using mdss commands in copyq. This is so that jobs only run when the the MDSS system is available, avoiding job failure.
- Quotas apply – use nci_account on the compute machines to see your MDSS quota and usage. See the Disk Quota Policy document for details of the ramifications of exceeding the quotas.
- The mdss access is intended for relatively modest mass data storage needs. Users with larger capacity storage or more sophisticated access needs should contact us to get an account on the data cluster.
List all files and directories in your massdata directory:
Create a directory ‘foo’ in your massdata directory:
Submit a copyq batch job that creates a zipped tarbar (eg. named ‘mytarball.tar.gz’) from directory ‘mydir/‘ and copies it to your massdata subdirectory:e subdirectory ‘foo/’
Bundling Small Files
Archival of small files (< 20Mb) places a processing burden on MDSS and hence reduces overall throughput to all users. Users with many small files are requested to bundle them into larger files. Common tools used to bundle files are tar(1) and cpio(1). Although it’s preferable to bundle files on your local machine, data intensive projects will also have a short term area on Raijin that can be used to bundle files before being saved in the MDSS filesystem.
Transfer files to MDSS from Raijin (perhaps as part of a jobscript):
To quantify ‘many’ and ‘small’: an account, having more than 30% of its files smaller than 10 MB reaches our cautionary limit. When this threshold is reached, the user is required to commence bundling the small files into larger container files (e.g. using ‘tar’ or ‘cpio’).
Backups of Quickly Changing Source Code
Archive copies of source trees are a common use of archival systems. The preferred backup model is to manage the source under a configuration management tool and then backup only the major software milestones. The configuration manager will manage the day-to-day changes on the user’s local disk.
Contact NCI User Services for advice on the selection, integration and use of common public domain configuration management tools (e.g. CVS, SCCS, RCS) which provide methods of tracking daily changes (thus enabling easy backout of injudicious modifications) and major branches in code development.
Locking Files onto Disk Cache
Locking the initial kilobytes of a file onto disk cache will speed the initial flow of data to the requester. However, the disk cache could easily become full with locked down data if users routinely set this attribute. A small number of such locked files is permissible. NCI staff will closely monitor the use of this attribute. If this attribute is absolutely required by your project, please contact NCI User Services.
To prevent user login delay, the standard startup files will be automatically, in their entirety, locked onto the disk when they are found during a weekly automated search. The default locked files are: .cshrc, .profile, .login, .rhosts, .logout, .history, and .sshrc/*.
Recovery of Lost Files
Recovery of files inadvertently removed or mangled is possible within the period of one week. The short timeframe is due to the possibility that the tape holding the original file might be returned to the tape pool.
The recovery process is complex and not automatic. MDSS does not version pre-existing files of the same name–so the recovery involves reloading an old snapshot of the MDSS database and then retrieving the desired entry. Users are requested to submit a file recovery request only for critical, irreplaceable data.
In the event of disaster, contact NCI User Services. Include the following:
– your name
– contact number
– full pathname of lost file
– date file was removed
– best guess as to when file was created and/or last modified.