For this example, we will use the Tiledb.CF-Py library to convert an example NetCDF file to TileDB arrays using in a Python Jupyter Notebook.
The following libraries will need to be imported for this example:
Code Block | ||
---|---|---|
| ||
import tiledb import tiledb.cf import xarray as xr import numpy as np import matplotlib.pyplot as plt import os import shutil |
The following block will nci_ipynb package could move the working directory to same location with this the notebook. This is particularly useful when working at ARE JupyterLab session, in which the default working directory always starts from the user's home directory.
Code Block | ||
---|---|---|
| ||
import nci_ipynb os.chdir(nci_ipynb.dir()) |
...
This NetCDF file contains a variable called "w" with 4 coordinates, i.e. "longitude","latitude", "level" and "time".
<xarray.Dataset> Size: 154MB Dimensions: (longitude: 1440, latitude: 721, level: 37, time: 1) Coordinates: * longitude (longitude) float32 6kB -180.0 -179.8 -179.5 ... 179.5 179.8 * latitude (latitude) float32 3kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0 * level (level) int32 148B 1 2 3 5 7 10 20 ... 875 900 925 950 975 1000 * time (time) datetime64[ns] 8B 2020-01-01 Data variables: w (time, level, latitude, longitude) float32 154MB ... Attributes: Conventions: CF-1.6 license: Licence to use Copernicus Products: https://apps.ecmwf.int/... summary: ERA5 is the fifth generation ECMWF atmospheric reanalysis o... title: ERA5 pressure-levels monthly-averaged vertical_velocity 202... history: 2020-11-05 14:53:38 UTC+1100 by era5_replication_tools-1.5....
|
Next we can define our output directory and the folder where the TileDB arrays will be written to:
...
Now we can convert the above netCDF file via a single method called 'from_netcdf' provided by by Tiledb.CF-Py library
Code Block | ||
---|---|---|
| ||
tiledb.cf.from_netcdf(netcdf_file,uri1) |
Let's check the file structure of the produced tileDB object arrays as below:
Code Block | ||
---|---|---|
| ||
!tree $uri1 |
It is noticed that each coordinate and variable of the original NetCDF file is converted into a separated array. Specifically, the variable "w" is converted into the TileDB array named "array4".
./dataset/tiledb/from_netcdf
├── __group
│ └── __1713868510463_1713868510463_bd0772ad9f2541b781d54ca17d7334be_2
├── __meta
│ └── __1713868510497_1713868510497_4b731228555e45c6bf7e89b5b7514905
├── __tiledb_group.tdb
├── array0
│ ├── __commits
│ │ └── __1713868510520_1713868510520_01ac756b068543e68d7790ac1bab5283_19.wrt
│ ├── __fragment_meta
│ ├── __fragments
│ │ └── __1713868510520_1713868510520_01ac756b068543e68d7790ac1bab5283_19
│ │ ├── __fragment_metadata.tdb
│ │ └── a0.tdb
│ ├── __labels
│ ├── __meta
│ │ └── __1713868510518_1713868510518_8da9574e9ee94e91be73ee244aaa61d6
│ └── __schema
│ └── __1713868510352_1713868510352_4b959e9d22794ef3a529b5f0b0975caa
├── array1
│ ├── __commits
│ │ └── __1713868510623_1713868510623_780b26db1aa4417f86a60e68a8cf70cb_19.wrt
│ ├── __fragment_meta
│ ├── __fragments
│ │ └── __1713868510623_1713868510623_780b26db1aa4417f86a60e68a8cf70cb_19
│ │ ├── __fragment_metadata.tdb
│ │ └── a0.tdb
│ ├── __labels
│ ├── __meta
│ │ └── __1713868510622_1713868510622_c9f5ea9235674c44ab93f89c560d3e1f
│ └── __schema
│ └── __1713868510411_1713868510411_0e59d6e79ed1438eb1b9d227821e2afb
├── array2
│ ├── __commits
│ │ └── __1713868510781_1713868510781_c823542d13f14a25adcc249c4b623341_19.wrt
│ ├── __fragment_meta
│ ├── __fragments
│ │ └── __1713868510781_1713868510781_c823542d13f14a25adcc249c4b623341_19
│ │ ├── __fragment_metadata.tdb
│ │ └── a0.tdb
│ ├── __labels
│ ├── __meta
│ │ └── __1713868510780_1713868510780_0f40c0ce66364be78606ec19a50fa029
│ └── __schema
│ └── __1713868510425_1713868510425_3dfa3d3d144545d2afcb6ca832fa0546
├── array3
│ ├── __commits
│ │ └── __1713868510900_1713868510900_a19e8fd0ab3d4cfd86ef3db371d7ec6d_19.wrt
│ ├── __fragment_meta
│ ├── __fragments
│ │ └── __1713868510900_1713868510900_a19e8fd0ab3d4cfd86ef3db371d7ec6d_19
│ │ ├── __fragment_metadata.tdb
│ │ └── a0.tdb
│ ├── __labels
│ ├── __meta
│ │ └── __1713868510899_1713868510899_f16404fbdb6349bda5bfab183f477759
│ └── __schema
│ └── __1713868510438_1713868510438_cb2c5687f74d4fbab05eb33a21644d0a
└── array4
├── __commits
│ └── __1713868511500_1713868511500_7579133e0b204675a39e403d70dfad60_19.wrt
├── __fragment_meta
├── __fragments
│ └── __1713868511500_1713868511500_7579133e0b204675a39e403d70dfad60_19
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
├── __labels
├── __meta
│ └── __1713868511065_1713868511065_905859f0d66c4d468c9b55f508642e2a
└── __schema
└── __1713868510451_1713868510451_f04050a87cb94785afef1c74ef04a241
42 directories, 28 files |
Run a sanity check to ensure that we can open one of our new TileDB arrays.
...
It shows that the TileDB attribute "w" ( which is equavalent to the variable in NetCDF) has 4 dimensions as below
['__tiledb_attr.w.add_offset', '__tiledb_attr.w.long_name', '__tiledb_attr.w.missing_value', '__tiledb_attr.w.scale_factor', '__tiledb_attr.w.standard_name', '__tiledb_attr.w.units']
ArraySchema(
domain=Domain(*[
Dim(name='time', domain=(0, 0), tile=1, dtype='uint64'),
Dim(name='level', domain=(0, 36), tile=13, dtype='uint64'),
Dim(name='latitude', domain=(0, 720), tile=241, dtype='uint64'),
Dim(name='longitude', domain=(0, 1439), tile=480, dtype='uint64'),
]),
attrs=[
Attr(name='w', dtype='int16', var=False, nullable=False),
],
cell_order='row-major',
tile_order='row-major',
sparse=False,
)
Vertical velocity |
We can also use Xarray to open the produced TileDB arrays and conduct further opearation, such as plot some example data:making a plot.
Code Block | ||
---|---|---|
| ||
data = xr.open_dataset(w_path, engine="tiledb") data |
...