The "Introduction to Dask" workshop offers a comprehensive guide to utilising Dask, a parallel computing library for Python designed to scale data analysis from single machines to clusters. Participants will explore Dask's core components, including task graphs, arrays, dataframes, delayed, and futures. The workshop also covers distributed computing with Dask and its integration with machine learning through Dask-ML. A Python virtual environment is provided for hands-on exercises.
If you have any questions regarding this training, please contact training.nci@anu.edu.au.
Parallel Python is also available online on NCI Teachable website.
Experience with Python.
Experience with bash or similar Unix shells.
Having a valid NCI account and vp91 membership (instructions will be sent out before the event)
The training session is driven on the NCI ARE service. You can find relevant documentations here: ARE User Guide.
Describe what Dask is and when to use it for large data or parallel tasks.
Work with Dask Arrays and DataFrames to handle datasets that don’t fit in memory.
Use dask.delayed
to turn regular Python code into parallel tasks.
Run Dask on a laptop or a cluster using the Dask Distributed scheduler.
Monitor their computations using the Dask Dashboard.
Combine Dask with tools they already know, like NumPy and Pandas.
Foundation Topics
Task Graphs
Dask Arrays
Dask DataFrame
Dask Delayed
Dask Futures
Distributed Dask
Machine Learning Topics
Dask ML
Hyper Parameter Search
Parallel Prediction
Incremental Learning
Distributed Learning