examples-dask

The Dask themed notebook tutorials demonstrate how to use Dask on data collections hosted at the NCI as well as data extracted from external databases.

Notebook availability

NCI filesystem path: /g/data/dk92/notebooks/examples-dask

Github: https://github.com/NCI-data-analysis-platform/examples-dask

To preview these notebooks: https://nbviewer.jupyter.org/github/NCI-data-analysis-platform/examples-dask/tree/main/

filename	description	dataset	data project to join
Dask_01_basics.ipynb	Dask lazy loading; progressBar; reduction	none	none
Dask_02_data_chunks_CMIP6.ipynb	Dask array basics; NetCDF chunks vs dask chunks; chunking practices	ESGF CMIP6 Replication Data	oi10
Dask_03_fundamentals_Delayed.ipynb	Dask.delayed feature; parallelise a for loop	none	none
Dask_04_delayed_pandas_palioceanography.ipynb	Parallelise sequential code using Dask delayed	csv files downloaded from a nature geoscience paper	none
Dask_05_dataframes_ACTweather.ipynb	Read in ACT weather data in Dask Dataframe; save to Parquet for better performance; comparison between dask.dataframe and Pandas	weather data downloaded from the BoM website	none
Dask_06_schedulers_ACTweather.ipynb	Introduce Dask schedulers; apply schedular options to weather station data	weather data downloaded from the BoM website	none
Dask_07_numpy_temperature.ipynb	Introduce Dask.array chunks; parallelise code; performance comparison with real data examples	Australian temperature data provided by the BoM	none
Dask_08_xarray_CMIP6.ipynb	Use standard xarray operations on Dask Array; persist data into memory to speed up I/O; customise workflows and automatic parallelisation	ESGF CMIP6 Australian Data	fs38
Dask_09_xarray_precipitation.ipynb	Calculate the intra-ensemble range for all the mean daily temperature and average seasonal precipitation in Australia using historical precipitation data of the CESM2 model within CMIP6	ESGF CMIP6 Australian Data	fs38
Dask_10_interactive_visualisation_CMIP6.ipynb	Calculate time and zonal mean of the temperature of CMIP6 GFDL models and interactively visualise data	ESGF CMIP6 Replication Data	oi10
Dask_11_diagnositc_tools.ipynb	Introduce a few diagnostic tools such as visualising task graphs, local and distributed diagnostics tools	ESGF CMIP6 Australian Data	fs38
Dask_12_intensive_calculation_cmip6.ipynb	Explore some of the Coupled Model Intercomparison Project (CMIP6) replication data to demonstrate how Dask handles expensive calculations	ESGF CMIP6 Replication Data	oi10
Dask_13_intensive_calculation_eReef.ipynb	Calculate sea level variability using near-real time and hindcast models of hydrodynamics for the Great Barrier Reef	eReefs	fx3
Dask_14_distributed_dataframes_geochem.ipynb	Persist common intermediate results in memory and use indices to improve calculation efficiency	OZCHEM - Geoscience Australia's national whole-rock geochemical dataset	dk92
Dask_15_distributed_advanced.ipynb	Introduce the feature of distributed futures; persist into memory; asynchronous computation; debugging approaches; discussion on how to set up the number of Dask workers	none	none
Dask_16_memory_compute_management.ipynb	Strategies for managing larger-than-memory data using partition; saving data onto disk; cleaning ram; executing in the background	ESGF CMIP6 Replication Data	oi10
Dask_17_bag.ipynb	Parse json object as a dictionary and apply map, filter and groupby functions	json files	none
Dask_18_machine_learning.ipynb	Distributed training; training larger-than-memory datasets	none	none

Page tree

examples-dask

Notebook availability