The Dask themed notebook tutorials demonstrate how to use Dask on data collections hosted at the NCI as well as data extracted from external databases. 

Notebook availability

NCI filesystem path: /g/data/dk92/notebooks/examples-dask

Github: https://github.com/NCI-data-analysis-platform/examples-dask

To preview these notebooks: https://nbviewer.jupyter.org/github/NCI-data-analysis-platform/examples-dask/tree/main/

filenamedescriptiondataset

data project

to join

Dask_01_basics.ipynb

Dask lazy loading; progressBar; reduction nonenone
Dask_02_data_chunks_CMIP6.ipynb

Dask array basics; NetCDF chunks vs dask chunks; chunking practices

ESGF CMIP6 Replication Dataoi10
Dask_03_fundamentals_Delayed.ipynbDask.delayed feature; parallelise a for loopnonenone
Dask_04_delayed_pandas_palioceanography.ipynb

Parallelise sequential code using Dask delayed

csv files downloaded from a nature geoscience papernone
Dask_05_dataframes_ACTweather.ipynb

Read in ACT weather data in Dask Dataframe; save to Parquet for better performance; comparison between dask.dataframe and Pandas

weather data downloaded from the BoM websitenone
Dask_06_schedulers_ACTweather.ipynb

Introduce Dask schedulers; apply schedular options to weather station data

weather data downloaded from the BoM websitenone
Dask_07_numpy_temperature.ipynb

Introduce Dask.array chunks; parallelise code; performance comparison with real data examples

Australian temperature data provided by the BoMnone
Dask_08_xarray_CMIP6.ipynb

Use standard xarray operations on Dask Array; persist data into memory to speed up I/O; customise workflows and automatic parallelisation

ESGF CMIP6 Australian Data

fs38
Dask_09_xarray_precipitation.ipynbCalculate the intra-ensemble range for all the mean daily temperature and average seasonal precipitation in Australia using historical precipitation data of the CESM2 model within CMIP6

ESGF CMIP6 Australian Data

fs38
Dask_10_interactive_visualisation_CMIP6.ipynb

Calculate time and zonal mean of the temperature of CMIP6 GFDL models and interactively visualise data

ESGF CMIP6 Replication Data

oi10
Dask_11_diagnositc_tools.ipynb

Introduce a few diagnostic tools such as visualising task graphs, local and distributed diagnostics tools

ESGF CMIP6 Australian Data

fs38
Dask_12_intensive_calculation_cmip6.ipynb

Explore some of the Coupled Model Intercomparison Project (CMIP6) replication data to demonstrate how Dask handles expensive calculations

ESGF CMIP6 Replication Data

oi10
Dask_13_intensive_calculation_eReef.ipynb

Calculate sea level variability using near-real time and hindcast models of hydrodynamics for the Great Barrier Reef

eReefsfx3
Dask_14_distributed_dataframes_geochem.ipynb

Persist common intermediate results in memory and use indices to improve calculation efficiency

OZCHEM - Geoscience Australia's national whole-rock geochemical datasetdk92
Dask_15_distributed_advanced.ipynb

Introduce the feature of distributed futures; persist into memory; asynchronous computation; debugging approaches; discussion on how to set up the number of Dask workers

nonenone
Dask_16_memory_compute_management.ipynbStrategies for managing larger-than-memory data using partition; saving data onto disk; cleaning ram; executing in the background

ESGF CMIP6 Replication Data

oi10

Dask_17_bag.ipynb 

Parse json object as a dictionary and apply map, filter and groupby functions

json filesnone
Dask_18_machine_learning.ipynb

Distributed training;  training larger-than-memory datasets

nonenone
  • No labels