Dask can distribute data and computation over multiple GPUs, either in the same system or in a multi-node cluster. Dask integrates with both RAPIDS cuDF, XGBoost, and RAPIDS cuML for GPU-accelerated data analytics and machine learning. Please refer to official Dask site for more details.
From the ARE Jupyterlab, you can either set up:
- a Local Dask CUDA cluster (local to the node where JupyterLab is running)
- a "pre-defined" Dask GPU cluster using multiple GPU nodes directly from the ARE when starting the JupyterLab session
Please refer to here for running Dask CPU cluster on the NCI ARE.
After login with ARE, you can click the "JupyterLab" button to start a JupyterLab session.
Local Dask GPU Cluster
In the JupyterLab launch form, you can request single GPU node resource as below:
Step1:
| Step 2:
|
---|---|
Wait for the job to start until the "Open JupyterLab" button is highlighted and then click it to open the JupyterLab interface.
In the JupyterLab notebook, you can utilise the requested GPU resources in a local node to start a Local Dask GPU Cluster on-the-fly. Note These resources are the same with what you have for the JupyterLab session.
After importing the necessary Dask modules, the essential lines needed in the Jupyter notebook are
from dask_cuda import LocalCUDACluster from dask.distributed import Client cluster = LocalCUDACluster() client = Client(cluster)
After it is set up, you can check the configuration via the Jupyter command "print(client)" as shown below
print(client) <Client: 'tcp://127.0.0.1:40045' processes=4 threads=4, memory=376.57 GiB>
This output shows a local Dask CUDA cluster (via the node-local loopback network interface 127.0.0.1) running on 4 GPUs - which are the resources that were requested as part of the setup of the current JupyterLab session of this example (i.e., 4 GPU Cores and 376 GB CPU memory).
You can check more details of the local Dask CUDA cluster as below:
The outputs indicate there are 4 workers in a single GPU node and each worker occupies 1 GPU device ( Tesla V100-SXM2-32GB).
Pre-defined ARE Dask cluster
If you need a Dask cluster crossing multiple computer nodes, you can set up a pre-defined Dask GPU cluster via gadi_jupyterlab module.
You could request multiple GPU nodes as below (2 nodes as in the example form)
Step1:
| Step 2:
|
---|---|
Wait until the "Open JupyterLab" button is highlighted and then click it to open the JupyterLab interface.
In the JupyterLab notebook, you can connect to the pre-defined Dask cluster by adding the following lines
from dask.distributed import Client import os client = Client(scheduler_file=os.environ["DASK_PBS_SCHEDULER"]) print(client)
You will see in this example the Dask GPU cluster consists of 8 processes (workers) and 8 threads ( 1 thread per worker).
For the more details you can run "client" directly:
Now you can see 8 workers come from 2 nodes, each node has 4 workers, each worker occupies a GPU device ( Tesla V100-SXM2-32GB).
Monitor GPU status
Dask-dashboard
You can monitor GPU status via the Dask-Dashboard after setting up the localCUDACluster or connecting to a pre-defined Dask GPU cluster in your notebook.
For example, you click "WORKER" button to view all worker activities as below
Step1:
| Step 2:
|
---|---|
For more details on Dask dashboard, please refer to here.
You can also use NVDashboard by clicking the “GPU Dashboards” menu along the left-hand side of your JupyterLab environment.
For more details on NVDashboard, please refer to here.