Page tree


Overview

Nirin Cloud provides an integrated object storage service. This service provides storage for objects: discrete chunks of data and metadata which are created, uploaded and accessed as a unit. Objects are grouped inside containers, with a project's object store made up of multiple containers each with multiple objects inside - the combination of the base service URL, the project id, the container name and the object name gives a URL by which an individual object can be accessed.

Object storage is best suited to data sets which do not change through their lifetime - object storage is generally optimised to support uploading data once and then accessing it many times; however the architecture is quite flexible, so it can be useful for things ranging from one-off data transfers, short term storage of ephemeral data, through to long term storage of archival data. 

Nirin Cloud's integrated object storage is implemented using CEPH and its radosgw service, with authentication handled by the Nirin Cloud identity service. Access is thus managed by your NCI user account.

Nirin Cloud's integrated object storage supports two standard access APIs: the OpenStack Swift API, and the AWS S3 API. Both of these are supported by a wide range of data management tools as well as computational and data processing environments and libraries.

Pre-requisites

In order to be able to use the Nirin Cloud integrated object storage service you must have access to the Nirin Cloud. For most operations you will also need to have Nirin API Access - while many operations can be done via the OpenStack dashboard, some require the use of the OpenStack command line client.

When using the Nirin Cloud integrated object storage service you will generally need to specify the project ID rather than the project name - this can be found using the openstack command line client:

$ openstack project show nci_test
+-------------+----------------------------------+
| Field       | Value                            |
+-------------+----------------------------------+
| description | NCI Test Project                 |
| domain_id   | dcb8d28bfc4840ffa3eb3127b369930b |
| enabled     | True                             |
| id          | 4d2ce112f02f4ebf9fd57336e1a50981 |
| is_domain   | False                            |
| name        | nci_test                         |
| options     | {}                               |
| parent_id   | dcb8d28bfc4840ffa3eb3127b369930b |
| tags        | []                               |
+-------------+----------------------------------+

Here the project ID is 4d2ce112f02f4ebf9fd57336e1a50981 .

A Note on Terminology

While there are no formal standards for naming object storage entities, there are some common naming conventions that we have summarised here.

Logical EntityAWS/S3 NameSwift Name
Project/organisationAccountTenant/Project
Container/collectionBucketContainer
ObjectObjectObject

Typically an object will be accessed via a URL along these lines:

https://{service-endpoint}/{project}/{container}/{object-name}

Object Storage Basics

A full discussion of the use of object storage is beyond the scope of this document, however a brief summary of the most basic operations is provided here. More detailed discussions of this can be found in various external sources, some of which are linked in the Further Reading section of this page.

Concepts

Projects, Tenants, Accounts, Users

The data in an object store is tied to a project/tenant/account, which owns the data and is responsible for managing it. An additional level of ownership is generally available via users, which are members of a project. Access control rules can be configured which limit access to data owned by a project - the details of those rules and what they're capable of depend on the object store implementation.

Organisation of Data

The fundamental unit of data in an object store is the object, which can most easily be thought of as a single file along with some metadata. Objects are immutable - their data cannot be changed once they are created, though the metadata associated with them can generally be updated. To change an object you must replace it - deleting the old data and replacing it with the new.

Objects exist within containers or buckets - a special entity that contains a list of objects, along with its own set of metadata. Containers sit at the top level of a project's object storage space, and do not nest - they are exactly one level deep, so there's no way to create a nested hierarchy of containers as you would expect on a filesystem. However, most object storage services support a pseudo-hierarchy using "folders" - these are nothing more than a prefix string as part of the container name, e.g. "foo/bar/baz.txt", but they are generally treated as if they represented a true hierarchy. Exactly how "folders" are handled depends on both the client and the object storage service, but in most cases they will be treated as a nesting directory tree rather than as part of the object name.

Objects cannot exist at the top level - they must be nested inside a container. Objects can only exist in a single container - two objects with the same name in different containers are different objects, both logically and physically.

Access Control

The metadata associated with objects and containers can include access control rules. These rules will generally provide a way to allow or deny a request sent to the service by a user - most often a read or write request on a container or read request on an object, but depending on the implementation details the access control rules can be almost arbitrarily complex. Rules are applied based on the user making the request - either the user they authenticated as when sending the request, or a catch-all public user for unauthenticated requests.

The Nirin Cloud integrated object storage service supports a subset of the Swift and S3 access control mechanisms, with additional constraints which make them very limited for practical use. The NCI Cloud Team does not recommend the use of anything beyond the simplest case of enabling public read access to a container, which can be done via a simple check box for each container in the dashboard's Object Store → Containers tab.

Upload and Download

Object uploads are done using a client, or the dashboard (which is simply a web-based client). Each client will do things a little differently, but the general pattern for use is:

  • create a container
  • upload one or more files into the container

Each file uploaded will become a single object.

Objects are accessible via a URL which combines the service endpoint address, the project name or id, the container name, and the object name - a real example:

$ curl https://cloud.nci.org.au:8080/swift/v1/AUTH_4d2ce112f02f4ebf9fd57336e1a50981/test-bucket/message.txt
Hello

Any HTTP client can access the URL - for publicly accessible objects this will behave exactly like any other publicly accessible URL, for access controlled objects the client will need to authenticate in order for the request to be allowed. The details of authenticating the download request are generally quite complex - dedicated clients, as discussed below, handle all the details of authentication in addition to supporting more sophisticated operations.

Data Management

The data stored in the Nirin Cloud integrated object storage service is not managed automatically in any way - uploaded data will remain in place unchanged until it is explicitly deleted by the owner. The NCI Cloud Team recommends that projects making heavy use of object storage set up tooling to automatically manage their data, particularly to ensure that stale or no longer valid data is not left in place for long periods.

The data stored in the Nirin Cloud integrated object storage service is not backed up in any way. Although the storage infrastructure it is built on is designed to be robust and reliable, and is maintained as critical infrastructure for the Nirin Cloud, the NCI Cloud Team cannot provide any guarantees as to the long term retention of data.

Object and container versioning is not supported by the Nirin Cloud integrated object storage service.

Access

Dashboard Access

The Nirin Cloud integrated object storage service is accessible via the OpenStack dashboard at https://cloud.nci.org.au, under the Project → Object Store tab. This is the simplest way to access and manage your project's object storage, and provides basic tools for creating containers, uploading and downloading objects, and enabling/disabling public access to a container.

Alternative Clients

In addition, the object storage service can be accessed by clients using the Swift and S3 APIs. Access via these APIs requires Nirin API Access; using the S3 API requires that you have the OpenStack command line client installed and configured.

A wide range of clients can be configured to access the Nirin Cloud integrated object service using the Swift and S3 API endpoints. The details of configuring and using these clients are outside the scope of this document; however, we will provide information about configuring basic authentication for three common cases:

  • the openstack command line client
  • Rclone
  • s3cmd

The openstack command line client uses the Swift endpoint, the s3cmd tool uses the S3 endpoint, and Rclone can be configured to use both endpoints.

Authentication Methods

The Swift and S3 APIs use different authentication methods.

The Swift API uses the same authentication configuration as other OpenStack services: a username and password, a project name, and a region name are all required, along with the URL of the identity endpoint. These are the same values used to configure access to all OpenStack endpoints, so configuring Nirin API Access will provide you with all the necessary configuration details.

The S3 API endpoint uses specially generated  ec2 credentials  to authenticate the user and map their access to a particular project - these credentials consist of an access key and a secret, and are created using the openstack command line client:

$ openstack ec2 credentials create
+------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| Field      | Value                                                                                                                                   |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| access     | 556c7a694313448c9da3c761c1e46aa2                                                                                                        |
| links      | {'self': 'https://cloud.nci.org.au:5000/v3/users/51f4fbd226fd4410bfc9cb8e635a563c/credentials/OS-EC2/556c7a694313448c9da3c761c1e46aa2'} |
| project_id | 4d2ce112f02f4ebf9fd57336e1a50981                                                                                                        |
| secret     | ********************************                                                                                                        |
| trust_id   | None                                                                                                                                    |
| user_id    | 51f4fbd226fd4410bfc9cb8e635a563c                                                                                                        |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------+

Once generated, credentials can be accessed again via:

$ openstack ec2 credentials list
+----------------------------------+----------------------------------+----------------------------------+----------------------------------+
| Access                           | Secret                           | Project ID                       | User ID                          |
+----------------------------------+----------------------------------+----------------------------------+----------------------------------+
| 556c7a694313448c9da3c761c1e46aa2 | ******************************** | 4d2ce112f02f4ebf9fd57336e1a50981 | 51f4fbd226fd4410bfc9cb8e635a563c |
+----------------------------------+----------------------------------+----------------------------------+----------------------------------+

and deleted using the access value:

$ ec2 credentials delete 556c7a694313448c9da3c761c1e46aa2

Anyone who has the access and secret values for your ec2 credentials will be able to authenticate to the Nirin Cloud integrated object store as your user/project. These credentials are sensitive data and should be given the same treatment as your NCI password - in particular, they should be replaced on a regular basis, with the old credentials being deleted.

Which API should I use?

The question of which API to use is mostly one of convenience - if part of your workflow uses a tool that only works with S3 then you'll already have the ec2 credentials needed to configure rclone to use S3; otherwise, the Swift endpoint may be simpler to configure as you don't need the additional step of creating separate credentials. On the other hand, having separate and easily revocable ec2 credentials would allow you to avoid embedding your NCI password in an application configuration, significantly reducing the risk of security breaches. Both APIs provide access to the same data, and can be used interchangeably, so there is generally no reason to prefer one over the other.

Clients

The openstack client

The openstack command line client provides a number of sub-commands for managing and using object storage. Most relevant sub-commands are in the namespaces "object" and "container"; see openstack object --help and openstack container --help. In the example below the user creates a container called my-container, and uploads then retrieves a file.

Swift use via openstack commandline client
$ openstack container create my-container
+---------------------------------------+--------------+---------------------------------------------------+
| account                               | container    | x-trans-id                                        |
+---------------------------------------+--------------+---------------------------------------------------+
| AUTH_4d2ce112f02f4ebf9fd57336e1a50981 | my-container | tx00000ee2fc96d4de7089b-006465b2cd-62b61ca-nci-dc |
+---------------------------------------+--------------+---------------------------------------------------+
$ echo hello > message.txt 
$ openstack object create my-container message.txt 
+-------------+---------------+----------------------------------+
| object      | container     | etag                             |
+-------------+---------------+----------------------------------+
| message.txt | my-container  | b1946ac92492d2347c6235b4d2611184 |
+-------------+---------------+----------------------------------+
$ openstack object list my-container
+-------------+
| Name        |
+-------------+
| message.txt |
+-------------+
$ openstack object save --file message-downloaded.txt my-container message.txt
$ cat message-downloaded.txt
hello

Configuration for the openstack command line client is documented at Nirin API Access.

s3cmd

The s3cmd tool is an open source S3 client with support for a wide range of operations against the S3 API (not all of which are supported by the Nirin Cloud integrated object store). Some very basic usage examples are:

$ s3cmd mb s3://test-bucket
Bucket 's3://test-bucket/' created
$ s3cmd ls
2023-05-17 04:09  s3://test-bucket
$ echo "Hello" > message.txt
$ cat message.txt 
Hello
$ s3cmd put message.txt s3://test-bucket/
upload: 'message.txt' -> 's3://test-bucket/message.txt'  [1 of 1]
 6 of 6   100% in    0s     7.08 B/s  done
$ s3cmd ls s3://test-bucket
2023-05-17 04:11            6  s3://test-bucket/message.txt
$ s3cmd get s3://test-bucket/message.txt message-downloaded.txt
download: 's3://test-bucket/message.txt' -> 'message-downloaded.txt'  [1 of 1]
 6 of 6   100% in    0s     6.93 B/s  done
$ cat message-downloaded.txt 
Hello

Configuration

s3cmd by default uses a .s3cfg file in the users' home directory. The file can be generated interactively by running s3cmd --configure , however this is overkill for our use case - a far simpler option is to manually create the ~/.s3cfg file with contents based on the following template:

[default]
access_key = {access}
secret_key = {secret}
host_base = https://cloud.nci.org.au:8080
host_bucket = https://cloud.nci.org.au:8080

The access  and secret values should be taken from the ec2 credentials list output.

Rclone

Rclone is an extremely versatile data movement tool - far too complex and flexible to be meaningfully documented here. Here is a very quick example session:

$ rclone mkdir testing-s3:test-bucket
$ echo "Hello" > message.txt
$ cat message.txt 
Hello
$ rclone copy message.txt testing-s3:test-bucket/
$ rclone ls testing-s3:
        6 test-bucket/message.txt
$ rclone copy testing-s3:test-bucket/message.txt rclone-download
$ cat rclone-download/message.txt 
Hello

Note that in this case testing-s3 is a 'remote' defined in rclone's configuration - multiple remotes can be configured, with rclone performing various operations transferring data between them. Rclone remotes configuring access to the Nirin Cloud integrated object storage service can use either the Swift or S3 API endpoints.

Configuration

As with s3cmd, rclone supports an interactive configuration mode with the rclone config option, however it is quicker and easier to manually create either a Swift or S3 entry in your rclone.conf  file (typically in ~/.config/rclone  on Linux systems).

First, ensure that the ~/.config/rclone directory exists, and edit the rclone.conf file:

mkdir -p ~/.config/rclone
vi ~/.config/rclone/rclone.conf

Add an entry based on one of the following templates:

[nirin-swift]
type = swift
user = {username}
key = {password}
auth = https://cloud.nci.org.au:5000/v3
domain = NCI
tenant = {project}
tenant_domain = NCI
region = CloudV3
storage_url =
auth_version = 3

Here username , password and project are the standard Nirin authentication values.

[nirin-s3]
type = s3
provider = Ceph
env_auth = false
access_key_id = {access}
secret_access_key = {secret}
region =
endpoint = https://cloud.nci.org.au:8080/
location_constraint =
acl =
server_side_encryption =
storage_class =

In this case, access  and secret are the access  and secret fields from the ec2 credentials generated previously for your user/project.

Although only one entry is necessary, if both are configured they can be used interchangeably - to verify this you can do something like the following:

$ cat message.txt message2.txt 
Hello
Hello again
$ rclone lsd testing-swift:
$ rclone mkdir testing-s3:/test-bucket/
$ rclone lsd testing-swift:
           0 2023-05-17 18:37:50         0 test-bucket
$ rclone copy message.txt testing-swift:/test-bucket/
$ rclone ls testing-s3:
        6 test-bucket/message.txt
$ rclone copy message2.txt testing-s3:/test-bucket/
$ rclone ls testing-swift:
        6 test-bucket/message.txt
       12 test-bucket/message2.txt
$ rclone copy testing-s3:/test-bucket/ rclone-download/
$ ls rclone-download/
message2.txt  message.txt
$ cat rclone-download/message.txt rclone-download/message2.txt 
Hello
Hello again

Further reading

  • No labels