At the beginning of the notebook
To utilise the Dask server established from the PBS job, it is necessary to add and run the following cell at the beginning of your notebook:
from dask.distributed import Client,LocalCluster client = Client(scheduler_file='scheduler.json') print(client) |
The output will show the configuration of the client and Dask cluster. You can check that the number of cores matches what you requested in the job script.
In order to adaptively manage your compute resources, it is possible to submit PBS jobs to set up a Dask cluster as needed inside of your notebooks. This scales up your program and you can submit as many jobs as required in your workflow:
import dask.config from dask.distributed import Client,LocalCluster from dask_jobqueue import PBSCluster walltime = '01:00:00' cores = 96 memory = '160GB' cluster = PBSCluster(walltime=str(walltime), cores=cores, memory=str(memory), job_extra=['-P <project code>','-l ncpus='+str(cores),'-l mem='+str(memory), '-l storage=gdata/<project code>+gdata/<project code>+gdata/<project code>...'], header_skip=["select"],python=os.environ["DASK_PBS_PYTHON"]) cluster.scale(jobs=2) client = Client(cluster) client |
At the end of the jobscript
Add and run the cell below to gracefully stop your job:
!pangeo.end.sh |