...
Panel | ||
---|---|---|
| ||
|
In general NCI Nirin Cloud instances will continue to run until a dashboard user decides to shut them down. However there are some circumstances where instances may be administratively shut down, for example during some scheduled maintenance, during some emergency maintenance situations, or due to an unexpected outage.
It is NCI policy that instances shut down administratively must be restarted by the instance owner; unfortunately, this can cause some systems running on our Nirin Cloud to experience inconveniently long service outages. To remove this issue NCI has developed a tool by which Nirin Cloud projects can request that their instances be restarted automatically after an outage.
This document assumes that the user is familiar with the basics of using the NCI Nirin Cloud, as documented in the Nirin - Quick Start Guide. The user is also assumed to be familiar with using the OpenStack client tools for direct Nirin API Access.
If you have an instance with unique identifier $uuid
, and you would like that instance to be restarted automatically following an outage, you can flag this by adding metadata to the instance as follows
...
There is another metadata key nci_restart_after
which can be used to configure more complex ordering dependencies between instances. This is explained along with additional detail below.
The automated restart tool makes use of metadata that the user sets on their project's instances. Instance metadata is simply a set of arbitrary key=value
pairs that are associated with the instance, and which can be set by the user. The automated restart tool reads a set of NCI specified metadata keys and restarts instances based on the metadata values.
Two keys are recognised by the automated restart tool:
nci_restart
nci_restart_after
Setting the nci_restart
key on an instance will cause the instance to be restarted automatically. Note that the value is ignored: setting nci_restart=false
will still result in the instance being restarted. Please see the section on setting and clearing metadata below for details on how to remove the key from an instance, which will disable automated restart for that instance.
The nci_restart_after
key can be used to specify a restart dependency graph for a set of instances - the metadata value for an instance must be set to a comma separated list of instance UUIDs, all owned by the same project, which must be running before the instance can be restarted. The automated restart tool will use this information to calculate a topological sort of the dependency graph, and restart the instances in order.
For nci_restart_after
to function three requirements must be met: firstly, all instances in the dependency graph must have either the nci_restart
or nci_restart_after
keys set; secondly, the dependency graph must not have any cycles; and finally all the instances in a given dependency graph must be from the same project. If any of these requirements are not met none of the project's instances will be restarted.
...
The NCI Cloud Team will attempt to monitor for issues while restarting a project's instances, and where possible will attempt to contact the project. However, this will be a best effort attempt, and no guarantees are made.
In the current version there is no tool available to users to verify that their configuration is correct. Please take care while defining dependencies and setting metadata keys on your project, and if you have questions about your configuration feel free to contact the NCI help desk at help@nci.org.au for assistance.
The OpenStack command line tools are the recommended way to set instance metadata. Please see the Nirin API Access page for information about installing and running the OpenStack command line tools. This document assumes you are using the Unified OpenStack Client; other OpenStack tools may be used to manipulate instance metadata, but the details of their use is outside the scope of this document.
...
Code Block | ||
---|---|---|
| ||
$ openstack server unset --property nci_restart --property nci_restart_after instance_uuid |
A project has the following instances:
Code Block |
---|
$ openstack server list --column ID --column Name +--------------------------------------+-----------+ | ID | Name | +--------------------------------------+-----------+ | f10b24e4-da9a-4280-9cdf-cf7fe03af3df | test4 | | c32dbc05-55b6-491e-9e2b-585d76289483 | test3 | | 1933f3bd-3b69-408d-a0ef-a657890df955 | test2 | | 402e5c04-ea09-43c7-8bb4-f34e26b0c637 | test1 | +--------------------------------------+-----------+ |
Instances test1
and test2
need to be restarted, but test3
and test4
do not:
...
In this configuration test1
and test2
will be restarted in parallel, and test3
and test4
will not be restarted.
Instance test4
requires services running on test3
and test2
, and both test3
and test2
depend on a service running on test1
. This creates the following dependency graph:
...
In this configuration test1
will be restarted, test2
and test3
will be restarted in parallel once test1
is active, and test4
will be restarted after test2
and test3
are both active.
In this case all four instances need to be restarted, but one is independent of the others - test4
depends on test2
, which depends on test1
, but test3
does not depend on any others:
...
In this configuration test1
and test3
will be restarted in parallel, test2
will be restarted once test1
is active, and test4
will be restarted after test2
is active.
If you have further questions about NCI's automated restart system please contact the NCI help desk at help@nci.org.au
...