Page tree

The "rose suite-run" command fails with error "[FAIL] ERROR: No hosts currently compatible with this global configuration:"

Check whether you have an running persistent session with the name specified by the environment variable "CYLC_SESSION' or in the file "~/.persistent-sessions/cylc-session".

My PBS job state shows "submitted" but there is no job displayed from the output of the "qstat" command.

This might be caused by a missing path towards cylc commands. Check the PBS storage settings in your Cylc suite and make sure it contains "gdata/hr22+gdata/ki32".

The ssh, scp, or rsync commands defined in my suite are failing.

In most cases, you won't need any of these commands, as the Cylc processes are always running in the same persistent session and get unified access to all your home and project directories. You can comment out these commands in your suite configuration and try again.

If you do need to make these commands work, you can add the following lines to your "~/.ssh/config" file. In this case, it simply accesses the localhost of the persistent session, i.e. the same session. 

    Match exec "echo '%l' | grep -q ''" host gadi,,localhost
        HostName localhost
        IdentityFile ~/.persistent-sessions/%l/user.key
        StrictHostKeyChecking no
        Port 2222

When I start MATE Terminal in ARE VDI I see "ERROR: Directory '...' not found". What to do?

For example,

ERROR: Directory '/g/data/hh5/public/modules' not found

You will see this error message if your ~/.bashrc file tries to access a directory that you have not declared in the request form of the ARE VDI session.

The simplest fix is to delete the current VDI session and launch a new one with the relevant directory in the "Storage" field. For example, to access /g/data/hh5, add "gdata/hh5" to the Storage field.

How to login to Gadi from a normalbw node?

In the Mate terminal, run

$ ssh gadi-login-01

to access Gadi login node 1.

There are nine login nodes on Gadi. Choose an integer between 1 and 9 and replace it in your own ssh command. For example, if you want to use login node 9, run ssh gadi-login-09 inside the terminal in your VDI session.

The error message says "disk I/O error", what does it mean?

If your project is running out of storage space,you are very likely to see the disk I/O error. The tricky bit is to confirm where the excessive usage is.

To double check the total usage under a project, for example om02, run

$ nci_account -P om02

This command lists the usage both on /g/data and /scratch. If you see a larger number in the Used column than that in Allocation, and/or iUsed than iAllocation, go to your own project directory to clean up the data. 

If you suspect your $HOME directory is full, run

$ quota -s

and compare the numbers under space and limit. If space used is beyond the limit, clean up your home directory.

The error message says “Authentication failed”, what does it mean?

When checking out a copy of a suite, if it fails with an error message similar to the following

$ rosie co u-cq161
[FAIL] svn checkout -q /home/123/abc123/roses/u-cq161 # return-code=1, stderr=
[FAIL] svn: E170013: Unable to connect to a repository at URL ''
[FAIL] svn: E215004: No more credentials or we tried too many times.
[FAIL] Authentication failed

it suggests that the credential cache is no longer valid. You need to provide your password again. Simply run mosrs-auth again and provide your password before checking out the suite again.

$ mosrs-auth
INFO: You need to enter your MOSRS credentials here so that GPG can cache your password.
Please enter the MOSRS password for testuser:
INFO: Checking your credentials using Subversion. Please wait.
INFO: Successfully accessed Subversion with your credentials.
INFO: Checking your credentials using rosie. Please wait.
INFO: Successfully accessed rosie with your credentials.
$ rosie co u-cq161
[INFO] u-cq161: local copy created at /home/123/abc123/roses/u-cq161

When I run "mosrs-setup" or "mosrs-auth", I see "Saving credentials failed". What do I do now?

Try running

$ gpg-connect-agent killagent /bye
$ mosrs-auth --debug

and see if that fixes the problem. If the problem still occurs, take a copy of the debug messages and contact the NCI Helpdesk with keywords like "UK Met Office Environment" in your problem description.

  • No labels