Page tree

This Parallel Python workshop is designed to teach cutting edge techniques to work with big data and process data in parallel using Python and is suitable for all participants who want to enhance their data science capabilities.

Python is one of the most used programming languages worldwide with applications in almost every data-oriented application domain. The Python data science ecosystem is a rich platform for scaling up workflows, enhancing scientific research and improving insight. However, Python can be performance limited when large datasets or challenging computations are required. Parallel computing and efficient data handling can overcome this barrier, enhancing research throughput.

If you have any questions regarding this training, please contact training.nci@anu.edu.au.


Date, Time & Location

Online Workshop

9 am - 12 pm Canberra time

Registration is open now

Parallel Python is also available online on NCI Teachable website.


Prerequistes
  • Basic experience with Python is required.
  • Some grasp of array processing with NumPy would be helpful but is not required as we will do a brief refresher during the course.
  • The training session is driven on NCI Open OnDemand (OOD) service. Attendees are encouraged to review the following page for background information: Open OnDemand (OOD) Service


Objectives

The training is designed to be the first parallel programming course for scientists. As such, it aims to help attendees

  • Understand array programming with NumPy
  • Work with large and possibly heterogenous data using xarray.
  • Perform parallel computation using Dask


Learning Outcomes

At the completion of this training session, you will be able to

  • How use vectorized computation using NumPy
  • How to load, annotate and work with data using xarray
  • Serialise large datasets to file using xarray
  • Load data from cloud using OpenDap and xarray
  • Parallelise common workflows and arbitrary code using Dask
  • Combine Dask and xarray for big data processing
  • Combine Dask and GPUs for maximum data throughput
  • Feel confident in your data science skills to tackle your own problems


Topics Covered
  • Array programming in NumPy
  • Array datastructures and hierarchies
  • Loading and saving data efficiently to disk
  • Cloud-native computing
  • Parallel computing with Dask
  • Combining python packages for enhanced functionality.

  • No labels