When I start MATE Terminal in ARE VDI I see "ERROR: Directory '...' not found". What to do?

For example,

ERROR: Directory '/g/data/hh5/public/modules' not found

You will see this error message if your ~/.bashrc file tries to access a directory that you have not declared in the request form of the ARE VDI session.

The simplest fix is to delete the current VDI session and launch a new one with the relevant directory in the "Storage" field. For example, to access /g/data/hh5, add "gdata/hh5" to the Storage field.

How to login to Gadi from a normalbw node?

In the Mate terminal, run

$ ssh gadi-login-01

to access Gadi login node 1.

There are nine login nodes on Gadi. Choose an integer between 1 and 9 and replace it in your own ssh command. For example, if you want to use login node 9, run ssh gadi-login-09 inside the terminal in your VDI session.

The error message says "disk I/O error", what does it mean?

If your project is running out of storage space,you are very likely to see the disk I/O error. The tricky bit is to confirm where the excessive usage is.

To double check the total usage under a project, for example om02, run

$ nci_account -P om02

This command lists the usage both on /g/data and /scratch. If you see a larger number in the Used column than that in Allocation, go to your own project directory to clean up the data. 

If you suspect your $HOME directory is full, run

$ quota -s

and compare the numbers under space and limit. If space used is beyond the limit, clean up your home directory.

When I run "cylc stop", the error message says "Connection refused", how to fix it?

Any previous suite run that ended prematurely may leave information inconsistent with the current session. This confuses Cylc and you will see this error message when cylc stop tries to stop the suite running in the previous session which it has no access to. The full error message is very similar to the one shown below.

$ cylc stop u-cs809
Cannot connect: http://gadi-cpu-bdw-0015.gadi.nci.org.au:1814/set_stop_cleanly?kill_active_tasks=False: HTTPConnectionPool(host='gadi-cpu-bdw-0015.gadi.nci.org.au', port=1814): Max retries exceeded with url: /set_stop_cleanly?kill_active_tasks=False (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x14a52cf4c210>: Failed to establish a new connection: [Errno 111] Connection refused',))

To fix the error above,  delete the file ~/cylc-run/u-cs809/.service/contact  and start the suite again. For your own suite, replace u-cs809 with your suite name in the path of the contact file.

  • No labels