Check Group Memberships

It is necessary to check the project membership before starting a new persistent session since, once created, the membership inside the session gets no further updates (until you "restart" your persistent session).

To ensure you have the required project memberships before creating the persistent sessions, run the command `groups` like the following. 

[abc111@gadi-login-01 ~]$ groups | grep "hr22\|ki32\|access"
access hh5 cws_help rt52 dk92 up99 hr22 ki32 ki32_mosrs ki32_nemo 

In the return message, the matching string will be highlighted in red. If any memberships missing, join the corresponding project first. The above example checks three project memberships: hr22, ki32 and access, but you might need more memberships as your suite needs.  Please check our prerequisites page to make sure that you have all the required memberships.

Once the membership request(s) approved, you will receive the email from my.nci.org.au. Give it another 30-60 minutes to allow membership(s) synchronised across all servers. If no updates in any new gadi login shells for two hours, please contact help desk for further assistance.

Start a New Session 

One can run multiple suites using the same persistent session. As long as the project memberships inside the existing session allow, it is preferable not to create a new one just in order to run another suite.

With all the necessary project memberships in place, it is time to create a new persistent session.

You can initiate a fresh new session by giving the name with a syntax that also follows standard domain naming conventions (DNS). DNS allows lowercase English letters a-z, digits 0-9, and hyphens (Note that hyphens cannot be used at the start or end of a domain name, consecutive hyphens is typically disallowed, and hyphens cannot simultaneously appear in the third and fourth positions).

For example, you can create a session as below, which will then be the start of the full internal DNS name. In the case below, our choice for session name is cylc-test, which then augmented for the full DNS name by the system, when it automatically adds the username, project code and persistent server being used - which in this example is then `cylc-test.abc111.xy99.ps.gadi.nci.org.au`. Note that the example below also shows the unique "session UUID" (9234dec2-ed8a-8e40-bbc9-85d1) which is required later on when looking up the usage and/or killing the session.

[abc111@gadi-login-02 ~]$ persistent-sessions start cylc-test
session 9234dec2-ed8a-8e40-bbc9-85d1 running - connect using
  ssh cylc-test.abc111.xy99.ps.gadi.nci.org.au

Once the persistent session is created, you can test the password-less login like the example below using the fully generated internal DNS name.  This is only accessible within Gadi or the ARE.

[abc111@gadi-login-02 ~]$ ssh cylc-test.abc111.xy99.ps.gadi.nci.org.au
[abc111@cylc-test ~]$

No configurations in the file ~/.ssh/config  is required for this password-less login to work. There is a chance that your existing configurations conflict with the setup used by persistent sessions, try `ssh -v ` to debug the issue. 

To list and kill any existing sessions via the "persistent-sessions" command, see more details here.

Login on

At various stages of the Cylc workflow, you might want to work inside a shell running on the persistent session.

If you wanted to use utilities that require X windows (e.g. GCylc) then you will need to use the "ssh -Y" command option to log in all the way to the persistent session to enable the X11 forwarding. If you are using the ARE, then you should just login to the persistent session:

From the ARE MATE window:
[abc111@gadi-cpu-bdw-001 ~] $ ssh -Y cylc-test.abc111.xy99.ps.gadi.nci.org.au

Alternatively, if logging into gadi login node from your laptop, you would login Gadi and then the persistent session as below using a sequence of "ssh -Y" to fully pass through the X11 forward to your laptop:

[laptop]$ ssh -Y abc111@gadi.nci.org.au
[abc111@gadi-login-02 ~]$ ssh -Y cylc-test.abc111.xy99.ps.gadi.nci.org.au

List all Sessions

If you are unsure whether you have a current metascheduler session running (for example, after a quarterly maintenance), check using the following test.

[abc111@gadi-login-01 ~] $ persistent-sessions list
                                UUID  PROJECT       ADDRESS      CPUTIME  MEMORY
cac76067-5ad0-663c-9791-cfd1c53828f6     ab00      10.9.4.4 00:30:00.091  144.9M
18acc940-e61c-7907-c853-9c9b9013b1ab     ab00      10.9.4.3 00:02:26.939  118.8M

Note: as of 18 Mar 2024 the list does not show the domain name of sessions, and instead only their session UUIDs. It is likely that this will be updated to the `persistent-session` interface in the near future (see here for updates).

Kill a Session

As one example for when this is needed. If you update your NCI group memberships, and you will only need the new session in which the new membership(s) can be effective, kill the older session where you can no longer run the suites as missing permissions prohibit the necessary access.

Killing your meta-scheduler session will terminate all your current tasks running inside the session. However this doesn't kill any of your jobs that have been submitted, either queuing or running on Gadi.

If you wish to kill your meta-scheduler session, do the following.  You can look up the "session UUID" from the return message of the command `persistent-sessions list`.

$ persistent-sessions kill cac76067-5ad0-663c-9791-cfd1c53828f6

"Restart" a Session

If the Cylc meta-scheduler is interrupted by the loss of the underlying persistent session, and to support the continuity of your previous workflows, we recommend starting a new persistent session with the same name before restarting the Cylc workflow. The main way this might occur is because of a quarterly system maintenance even where all existing persistent sessions will be terminated.

In general, the name of the last used persistent session is kept inside the file `~/.persistent-sessions/cylc-session`.

$ cat ~/.persistent-sessions/cylc-session 
demo.abc111.xy99.ps.gadi.nci.org.au

On the individual suite level, the information can be found in the file  `~/cylc-run/<suiteid>/.service/contact`. If the suite doesn't have this "contact" file, it means the workflow manager quit gracefully and you should be able to re-run it inside any persistent sessions.  In the example below, the suite `u-cs809` has an existing "contact" file and it needs to be run inside a persistent session called `demo.abc111.xy99.ps.gadi.nci.org.au`.

$ cat ~/cylc-run/u-cs809/.service/contact | grep CYLC_SUITE_HOST
CYLC_SUITE_HOST=demo.abc111.xy99.ps.gadi.nci.org.au

To start the session with the same name, namely `demo.abc111.xy99.ps.gadi.nci.org.au`, see the example below.

[abc111@gadi-login-01 ~]$ persistent-sessions list
                                UUID  PROJECT       ADDRESS      CPUTIME  MEMORY
[abc111@gadi-login-02 ~]$ persistent-sessions start --project xy99 demo
session 4da222f6-1764-cead-1e92-5f6f637d7cd5 running - connect using
  ssh demo.abc111.xy99.ps.gadi.nci.org.au

In the above example, the return message from `persistent-sessions list` shows there is no existing sessions. If any session was listed, you should double check whether you can reuse it. One simple test is to rename the "contact" before restarting the suite and cylc will generate a new contact file. It is recommended to run all your suites inside the same session. 

  • No labels