Nirin Cloud provides an integrated object storage service. This service provides storage for objects: discrete chunks of data and metadata which are created, uploaded and accessed as a unit. Objects are grouped inside containers, with a project's object store made up of multiple containers each with multiple objects inside - the combination of the base service URL, the project id, the container name and the object name gives a URL by which an individual object can be accessed.
Object storage is best suited to data sets which do not change through their lifetime - object storage is generally optimised to support uploading data once and then accessing it many times; however the architecture is quite flexible, so it can be useful for things ranging from one-off data transfers, short term storage of ephemeral data, through to long term storage of archival data.
Nirin Cloud's integrated object storage is implemented using CEPH and its radosgw service, with authentication handled by the Nirin Cloud identity service. Access is thus managed by your NCI user account.
Nirin Cloud's integrated object storage supports two standard access APIs: the OpenStack Swift API, and the AWS S3 API. Both of these are supported by a wide range of data management tools as well as computational and data processing environments and libraries.
In order to be able to use the Nirin Cloud integrated object storage service you must have access to the Nirin Cloud. For most operations you will also need to have Nirin API Access - while many operations can be done via the OpenStack dashboard, some require the use of the OpenStack command line client.
When using the Nirin Cloud integrated object storage service you will generally need to specify the project ID rather than the project name - this can be found using the
openstack command line client:
Here the project ID is
While there are no formal standards for naming object storage entities, there are some common naming conventions that we have summarised here.
|Logical Entity||AWS/S3 Name||Swift Name|
Typically an object will be accessed via a URL along these lines:
A full discussion of the use of object storage is beyond the scope of this document, however a brief summary of the most basic operations is provided here. More detailed discussions of this can be found in various external sources, some of which are linked in the Further Reading section of this page.
The data in an object store is tied to a project/tenant/account, which owns the data and is responsible for managing it. An additional level of ownership is generally available via users, which are members of a project. Access control rules can be configured which limit access to data owned by a project - the details of those rules and what they're capable of depend on the object store implementation.
The fundamental unit of data in an object store is the object, which can most easily be thought of as a single file along with some metadata. Objects are immutable - their data cannot be changed once they are created, though the metadata associated with them can generally be updated. To change an object you must replace it - deleting the old data and replacing it with the new.
Objects exist within containers or buckets - a special entity that contains a list of objects, along with its own set of metadata. Containers sit at the top level of a project's object storage space, and do not nest - they are exactly one level deep, so there's no way to create a nested hierarchy of containers as you would expect on a filesystem. However, most object storage services support a pseudo-hierarchy using "folders" - these are nothing more than a prefix string as part of the container name, e.g. "foo/bar/baz.txt", but they are generally treated as if they represented a true hierarchy. Exactly how "folders" are handled depends on both the client and the object storage service, but in most cases they will be treated as a nesting directory tree rather than as part of the object name.
Objects cannot exist at the top level - they must be nested inside a container. Objects can only exist in a single container - two objects with the same name in different containers are different objects, both logically and physically.
The metadata associated with objects and containers can include access control rules. These rules will generally provide a way to allow or deny a request sent to the service by a user - most often a read or write request on a container or read request on an object, but depending on the implementation details the access control rules can be almost arbitrarily complex. Rules are applied based on the user making the request - either the user they authenticated as when sending the request, or a catch-all public user for unauthenticated requests.
The Nirin Cloud integrated object storage service supports a subset of the Swift and S3 access control mechanisms, with additional constraints which make them very limited for practical use. The NCI Cloud Team does not recommend the use of anything beyond the simplest case of enabling public read access to a container, which can be done via a simple check box for each container in the dashboard's
Object Store → Containers tab.
Object uploads are done using a client, or the dashboard (which is simply a web-based client). Each client will do things a little differently, but the general pattern for use is:
Each file uploaded will become a single object.
Objects are accessible via a URL which combines the service endpoint address, the project name or id, the container name, and the object name - a real example:
Any HTTP client can access the URL - for publicly accessible objects this will behave exactly like any other publicly accessible URL, for access controlled objects the client will need to authenticate in order for the request to be allowed. The details of authenticating the download request are generally quite complex - dedicated clients, as discussed below, handle all the details of authentication in addition to supporting more sophisticated operations.
The data stored in the Nirin Cloud integrated object storage service is not managed automatically in any way - uploaded data will remain in place unchanged until it is explicitly deleted by the owner. The NCI Cloud Team recommends that projects making heavy use of object storage set up tooling to automatically manage their data, particularly to ensure that stale or no longer valid data is not left in place for long periods.
The data stored in the Nirin Cloud integrated object storage service is not backed up in any way. Although the storage infrastructure it is built on is designed to be robust and reliable, and is maintained as critical infrastructure for the Nirin Cloud, the NCI Cloud Team cannot provide any guarantees as to the long term retention of data.
Object and container versioning is not supported by the Nirin Cloud integrated object storage service.
The Nirin Cloud integrated object storage service is accessible via the OpenStack dashboard at https://cloud.nci.org.au, under the
Project → Object Store tab. This is the simplest way to access and manage your project's object storage, and provides basic tools for creating containers, uploading and downloading objects, and enabling/disabling public access to a container.
In addition, the object storage service can be accessed by clients using the Swift and S3 APIs. Access via these APIs requires Nirin API Access; using the S3 API requires that you have the OpenStack command line client installed and configured.
A wide range of clients can be configured to access the Nirin Cloud integrated object service using the Swift and S3 API endpoints. The details of configuring and using these clients are outside the scope of this document; however, we will provide information about configuring basic authentication for three common cases:
openstackcommand line client
openstack command line client uses the Swift endpoint, the
s3cmd tool uses the S3 endpoint, and Rclone can be configured to use both endpoints.
The Swift and S3 APIs use different authentication methods.
The Swift API uses the same authentication configuration as other OpenStack services: a username and password, a project name, and a region name are all required, along with the URL of the identity endpoint. These are the same values used to configure access to all OpenStack endpoints, so configuring Nirin API Access will provide you with all the necessary configuration details.
The S3 API endpoint uses specially generated
ec2 credentials to authenticate the user and map their access to a particular project - these credentials consist of an access key and a secret, and are created using the
openstack command line client:
Once generated, credentials can be accessed again via:
and deleted using the
Anyone who has the
secret values for your
ec2 credentials will be able to authenticate to the Nirin Cloud integrated object store as your user/project. These credentials are sensitive data and should be given the same treatment as your NCI password - in particular, they should be replaced on a regular basis, with the old credentials being deleted.
The question of which API to use is mostly one of convenience - if part of your workflow uses a tool that only works with S3 then you'll already have the
ec2 credentials needed to configure rclone to use S3; otherwise, the Swift endpoint may be simpler to configure as you don't need the additional step of creating separate credentials. On the other hand, having separate and easily revocable
ec2 credentials would allow you to avoid embedding your NCI password in an application configuration, significantly reducing the risk of security breaches. Both APIs provide access to the same data, and can be used interchangeably, so there is generally no reason to prefer one over the other.
openstack command line client provides a number of sub-commands for managing and using object storage. Most relevant sub-commands are in the namespaces "object" and "container"; see
openstack object --help and
openstack container --help. In the example below the user creates a container called
my-container, and uploads then retrieves a file.
Configuration for the
openstack command line client is documented at Nirin API Access.
The s3cmd tool is an open source S3 client with support for a wide range of operations against the S3 API (not all of which are supported by the Nirin Cloud integrated object store). Some very basic usage examples are:
s3cmd by default uses a
.s3cfg file in the users' home directory. The file can be generated interactively by running
s3cmd --configure , however this is overkill for our use case - a far simpler option is to manually create the
~/.s3cfg file with contents based on the following template:
secret values should be taken from the
ec2 credentials list output.
Rclone is an extremely versatile data movement tool - far too complex and flexible to be meaningfully documented here. Here is a very quick example session:
Note that in this case
testing-s3 is a 'remote' defined in rclone's configuration - multiple remotes can be configured, with rclone performing various operations transferring data between them. Rclone remotes configuring access to the Nirin Cloud integrated object storage service can use either the Swift or S3 API endpoints.
As with s3cmd, rclone supports an interactive configuration mode with the
rclone config option, however it is quicker and easier to manually create either a Swift or S3 entry in your
rclone.conf file (typically in
~/.config/rclone on Linux systems).
First, ensure that the
~/.config/rclone directory exists, and edit the
Add an entry based on one of the following templates:
project are the standard Nirin authentication values.
In this case,
secret are the
secret fields from the
ec2 credentials generated previously for your user/project.
Although only one entry is necessary, if both are configured they can be used interchangeably - to verify this you can do something like the following: