Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

For this example, we will use the Tiledb.CF-Py library to convert an example NetCDF file to TileDB arrays using in a Python Jupyter Notebook.

The following libraries will need to be imported for this example:

Code Block
languagepy
import tiledb
import tiledb.cf
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import os
import shutil

The following block will nci_ipynb package could move the working directory to same location with this the notebook. This is particularly useful when working at ARE JupyterLab session, in which the default working directory always starts from the user's home directory.

Code Block
languagepy
import nci_ipynb
os.chdir(nci_ipynb.dir())

...

This NetCDF file contains a variable called "w" with 4 coordinates, i.e. "longitude","latitude", "level" and "time".


<xarray.Dataset> Size: 154MB
Dimensions:    (longitude: 1440, latitude: 721, level: 37, time: 1)
Coordinates:
  * longitude  (longitude) float32 6kB -180.0 -179.8 -179.5 ... 179.5 179.8
  * latitude   (latitude) float32 3kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * level      (level) int32 148B 1 2 3 5 7 10 20 ... 875 900 925 950 975 1000
  * time       (time) datetime64[ns] 8B 2020-01-01
Data variables:
    w          (time, level, latitude, longitude) float32 154MB ...
Attributes:
    Conventions:  CF-1.6
    license:      Licence to use Copernicus Products: https://apps.ecmwf.int/...
    summary:      ERA5 is the fifth generation ECMWF atmospheric reanalysis o...
    title:        ERA5 pressure-levels monthly-averaged vertical_velocity 202...
    history:      2020-11-05 14:53:38 UTC+1100 by era5_replication_tools-1.5....

 

Next we can define our output directory and the folder where the TileDB arrays will be written to:

...

Now we can convert the above netCDF file via a single method called 'from_netcdf' provided by by Tiledb.CF-Py library

Code Block
languagepy
tiledb.cf.from_netcdf(netcdf_file,uri1)

 Let's check the file structure of the produced tileDB object arrays as below:  

Code Block
languagepy
!tree $uri1

It is noticed that each coordinate and variable of the original NetCDF file is converted into a separated array. Specifically, the variable "w" is converted into the TileDB array named "array4".  

./dataset/tiledb/from_netcdf
├── __group
│   └── __1713868510463_1713868510463_bd0772ad9f2541b781d54ca17d7334be_2
├── __meta
│   └── __1713868510497_1713868510497_4b731228555e45c6bf7e89b5b7514905
├── __tiledb_group.tdb
├── array0
│   ├── __commits
│   │   └── __1713868510520_1713868510520_01ac756b068543e68d7790ac1bab5283_19.wrt
│   ├── __fragment_meta
│   ├── __fragments
│   │   └── __1713868510520_1713868510520_01ac756b068543e68d7790ac1bab5283_19
│   │       ├── __fragment_metadata.tdb
│   │       └── a0.tdb
│   ├── __labels
│   ├── __meta
│   │   └── __1713868510518_1713868510518_8da9574e9ee94e91be73ee244aaa61d6
│   └── __schema
│       └── __1713868510352_1713868510352_4b959e9d22794ef3a529b5f0b0975caa
├── array1
│   ├── __commits
│   │   └── __1713868510623_1713868510623_780b26db1aa4417f86a60e68a8cf70cb_19.wrt
│   ├── __fragment_meta
│   ├── __fragments
│   │   └── __1713868510623_1713868510623_780b26db1aa4417f86a60e68a8cf70cb_19
│   │       ├── __fragment_metadata.tdb
│   │       └── a0.tdb
│   ├── __labels
│   ├── __meta
│   │   └── __1713868510622_1713868510622_c9f5ea9235674c44ab93f89c560d3e1f
│   └── __schema
│       └── __1713868510411_1713868510411_0e59d6e79ed1438eb1b9d227821e2afb
├── array2
│   ├── __commits
│   │   └── __1713868510781_1713868510781_c823542d13f14a25adcc249c4b623341_19.wrt
│   ├── __fragment_meta
│   ├── __fragments
│   │   └── __1713868510781_1713868510781_c823542d13f14a25adcc249c4b623341_19
│   │       ├── __fragment_metadata.tdb
│   │       └── a0.tdb
│   ├── __labels
│   ├── __meta
│   │   └── __1713868510780_1713868510780_0f40c0ce66364be78606ec19a50fa029
│   └── __schema
│       └── __1713868510425_1713868510425_3dfa3d3d144545d2afcb6ca832fa0546
├── array3
│   ├── __commits
│   │   └── __1713868510900_1713868510900_a19e8fd0ab3d4cfd86ef3db371d7ec6d_19.wrt
│   ├── __fragment_meta
│   ├── __fragments
│   │   └── __1713868510900_1713868510900_a19e8fd0ab3d4cfd86ef3db371d7ec6d_19
│   │       ├── __fragment_metadata.tdb
│   │       └── a0.tdb
│   ├── __labels
│   ├── __meta
│   │   └── __1713868510899_1713868510899_f16404fbdb6349bda5bfab183f477759
│   └── __schema
│       └── __1713868510438_1713868510438_cb2c5687f74d4fbab05eb33a21644d0a
└── array4
    ├── __commits
    │   └── __1713868511500_1713868511500_7579133e0b204675a39e403d70dfad60_19.wrt
    ├── __fragment_meta
    ├── __fragments
    │   └── __1713868511500_1713868511500_7579133e0b204675a39e403d70dfad60_19
    │       ├── __fragment_metadata.tdb
    │       └── a0.tdb
    ├── __labels
    ├── __meta
    │   └── __1713868511065_1713868511065_905859f0d66c4d468c9b55f508642e2a
    └── __schema
        └── __1713868510451_1713868510451_f04050a87cb94785afef1c74ef04a241

42 directories, 28 files

Run a sanity check to ensure that we can open one of our new TileDB arrays.

...


It shows that the TileDB attribute "w" ( which is equavalent to the variable in NetCDF) has 4 dimensions as below

['__tiledb_attr.w.add_offset', '__tiledb_attr.w.long_name', '__tiledb_attr.w.missing_value', '__tiledb_attr.w.scale_factor', '__tiledb_attr.w.standard_name', '__tiledb_attr.w.units']

ArraySchema(
  domain=Domain(*[
    Dim(name='time', domain=(0, 0), tile=1, dtype='uint64'),
    Dim(name='level', domain=(0, 36), tile=13, dtype='uint64'),
    Dim(name='latitude', domain=(0, 720), tile=241, dtype='uint64'),
    Dim(name='longitude', domain=(0, 1439), tile=480, dtype='uint64'),
  ]),
  attrs=[
    Attr(name='w', dtype='int16', var=False, nullable=False),
  ],
  cell_order='row-major',
  tile_order='row-major',
  sparse=False,
)


Vertical velocity 

We can also use Xarray to open the produced TileDB arrays and conduct further opearation, such as  plot some example data:making a plot.

Code Block
languagepy
data = xr.open_dataset(w_path, engine="tiledb")
data

...