Specialised Environments

Page tree

The Dask themed notebook tutorials demonstrate how to use Dask on data collections hosted at the NCI as well as data extracted from external databases. 

Notebook availability

NCI filesystem path: /g/data/dk92/notebooks/examples-dask

Github: https://github.com/NCI-data-analysis-platform/examples-dask

To preview these notebooks: https://nbviewer.jupyter.org/github/NCI-data-analysis-platform/examples-dask/tree/main/

filenamedescriptiondataset

data project

to join

Dask_01_basics.ipynb

Dask lazy loading; progressBar; reduction nonenone
Dask_02_data_chunks_CMIP6.ipynb

Dask array basics; NetCDF chunks vs dask chunks; chunking practices

ESGF CMIP6 Replication Dataoi10
Dask_03_fundamentals_Delayed.ipynbDask.delayed feature; parallelise a for loopnonenone
Dask_04_delayed_pandas_palioceanography.ipynb

Parallelise sequential code using Dask delayed

csv files downloaded from a nature geoscience papernone
Dask_05_dataframes_ACTweather.ipynb

Read in ACT weather data in Dask Dataframe; save to Parquet for better performance; comparison between dask.dataframe and Pandas

weather data downloaded from the BoM websitenone
Dask_06_schedulers_ACTweather.ipynb

Introduce Dask schedulers; apply schedular options to weather station data

weather data downloaded from the BoM websitenone
Dask_07_numpy_temperature.ipynb

Introduce Dask.array chunks; parallelise code; performance comparison with real data examples

Australian temperature data provided by the BoMnone
Dask_08_xarray_CMIP6.ipynb

Use standard xarray operations on Dask Array; persist data into memory to speed up I/O; customise workflows and automatic parallelisation

ESGF CMIP6 Australian Data

fs38
Dask_09_xarray_precipitation.ipynbCalculate the intra-ensemble range for all the mean daily temperature and average seasonal precipitation in Australia using historical precipitation data of the CESM2 model within CMIP6

ESGF CMIP6 Australian Data

fs38
Dask_10_interactive_visualisation_CMIP6.ipynb

Calculate time and zonal mean of the temperature of CMIP6 GFDL models and interactively visualise data

ESGF CMIP6 Replication Data

oi10
Dask_11_diagnositc_tools.ipynb

Introduce a few diagnostic tools such as visualising task graphs, local and distributed diagnostics tools

ESGF CMIP6 Australian Data

fs38
Dask_12_intensive_calculation_cmip6.ipynb

Explore some of the Coupled Model Intercomparison Project (CMIP6) replication data to demonstrate how Dask handles expensive calculations

ESGF CMIP6 Replication Data

oi10
Dask_13_intensive_calculation_eReef.ipynb

Calculate sea level variability using near-real time and hindcast models of hydrodynamics for the Great Barrier Reef

eReefsfx3
Dask_14_distributed_dataframes_geochem.ipynb

Persist common intermediate results in memory and use indices to improve calculation efficiency

OZCHEM - Geoscience Australia's national whole-rock geochemical datasetdk92
Dask_15_distributed_advanced.ipynb

Introduce the feature of distributed futures; persist into memory; asynchronous computation; debugging approaches; discussion on how to set up the number of Dask workers

nonenone
Dask_16_memory_compute_management.ipynbStrategies for managing larger-than-memory data using partition; saving data onto disk; cleaning ram; executing in the background

ESGF CMIP6 Replication Data

oi10

Dask_17_bag.ipynb 

Parse json object as a dictionary and apply map, filter and groupby functions

json filesnone
Dask_18_machine_learning.ipynb

Distributed training;  training larger-than-memory datasets

nonenone
  • No labels