Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Panel
titleOn this page

Table of Contents

Overview

In general NCI Nirin Cloud instances will continue to run until a dashboard user decides to shut them down. However there are some circumstances where instances may be administratively shut down, for example during some scheduled maintenance, during some emergency maintenance situations, or due to an unexpected outage.

It is NCI policy that instances shut down administratively must be restarted by the instance owner; unfortunately, this can cause some systems running on our Nirin Cloud to experience inconveniently long service outages. To remove this issue NCI has developed a tool by which Nirin Cloud projects can request that their instances be restarted automatically after an outage.

Pre-requisites

This document assumes that the user is familiar with the basics of using the NCI Nirin Cloud, as documented in the Nirin - Quick Start Guide. The user is also assumed to be familiar with using the OpenStack client tools for direct Nirin API Access.

Quick start

If you have an instance with unique identifier $uuid, and you would like that instance to be restarted automatically following an outage, you can flag this by adding metadata to the instance as follows

...

There is another metadata key nci_restart_after which can be used to configure more complex ordering dependencies between instances. This is explained along with additional detail below.

Details

The automated restart tool makes use of metadata that the user sets on their project's instances. Instance metadata is simply a set of arbitrary key=value pairs that are associated with the instance, and which can be set by the user. The automated restart tool reads a set of NCI specified metadata keys and restarts instances based on the metadata values.

Metadata Keys

Two keys are recognised by the automated restart tool:

  • nci_restart
  • nci_restart_after

nci_restart

Setting the nci_restart key on an instance will cause the instance to be restarted automatically. Note that the value is ignored: setting nci_restart=false will still result in the instance being restarted. Please see the section on setting and clearing metadata below for details on how to remove the key from an instance, which will disable automated restart for that instance.

nci_restart_after

The nci_restart_after key can be used to specify a restart dependency graph for a set of instances - the metadata value for an instance must be set to a comma separated list of instance UUIDs, all owned by the same project, which must be running before the instance can be restarted. The automated restart tool will use this information to calculate a topological sort of the dependency graph, and restart the instances in order.

Notes on behavior

For nci_restart_after to function three requirements must be met: firstly, all instances in the dependency graph must have either the nci_restart or nci_restart_after keys set; secondly, the dependency graph must not have any cycles; and finally all the instances in a given dependency graph must be from the same project. If any of these requirements are not met none of the project's instances will be restarted.

...

The NCI Cloud Team will attempt to monitor for issues while restarting a project's instances, and where possible will attempt to contact the project. However, this will be a best effort attempt, and no guarantees are made.

Important Note

In the current version there is no tool available to users to verify that their configuration is correct. Please take care while defining dependencies and setting metadata keys on your project, and if you have questions about your configuration feel free to contact the NCI help desk at help@nci.org.au for assistance.

Setting Instance Metadata

The OpenStack command line tools are the recommended way to set instance metadata. Please see the Nirin API Access page for information about installing and running the OpenStack command line tools. This document assumes you are using the Unified OpenStack Client; other OpenStack tools may be used to manipulate instance metadata, but the details of their use is outside the scope of this document.

...

Code Block
languagebash
$ openstack server unset --property nci_restart --property nci_restart_after instance_uuid

Worked Examples

A project has the following instances:

Code Block
$ openstack server list --column ID --column Name
+--------------------------------------+-----------+
| ID                                   | Name      |
+--------------------------------------+-----------+
| f10b24e4-da9a-4280-9cdf-cf7fe03af3df | test4     |
| c32dbc05-55b6-491e-9e2b-585d76289483 | test3     |
| 1933f3bd-3b69-408d-a0ef-a657890df955 | test2     |
| 402e5c04-ea09-43c7-8bb4-f34e26b0c637 | test1     |
+--------------------------------------+-----------+

Instance restart with no dependencies

Instances test1 and test2 need to be restarted, but test3 and test4 do not:

...

In this configuration test1 and test2 will be restarted in parallel, and test3 and test4 will not be restarted.

Instance restart with dependencies

Instance test4 requires services running on test3 and test2, and both test3 and test2 depend on a service running on test1. This creates the following dependency graph:

...

In this configuration test1 will be restarted, test2 and test3 will be restarted in parallel once test1 is active, and test4 will be restarted after test2 and test3 are both active.

Instance restart with a mix of dependent and independent instances

In this case all four instances need to be restarted, but one is independent of the others - test4 depends on test2, which depends on test1, but test3 does not depend on any others:

...

In this configuration test1 and test3 will be restarted in parallel, test2 will be restarted once test1 is active, and test4 will be restarted after test2 is active.

Further Assistance

If you have further questions about NCI's automated restart system please contact the NCI help desk at help@nci.org.au

...