Page History

There are exists a lot plethora of online materials material related to followRAPIDS. For One such example , in is this blog blog [https://developer.nvidia.com/blog/accelerating-single-cell-genomic-analysis-using-rapids/] , the authors release which showcases accelerating single cell genomic analysis using RAPIDS. The authors of this blog released several notebooks to showcase their work . The and their code is expected to run as it is on Gadi with some modifications. Here we take For the following example, we will look at the notebook [https://github.com/clara-parabricks/rapids-single-cell-examples/blob/v2021.12.0/notebooks/1M_brain_gpu_analysis_multigpu.ipynb] as an example to and show how to modify our RAPIDS environment and run it this on Gadi and what the modifications can be.

Install Missing Packages

Most of the packages imported in this notebook are available in rapids/2022.02. There are only two missing packages, scanpy and anndata. Follow Please follow the instructions provided in the section `Work on this page under `Work with Other Python Packages` on this page to understand Packages` to gain an understanding of how to install topup additional packages. On the same page, we also show how to prepare the environment to use RAPIDS.

For this example, we can use pip to install scanpy:In this example the pip install command can be

Code Block
login-node $ python3 -m pip install -v --upgrade-strategy only-if-needed --prefix $INSTALL_DIR scanpy

where INSTALL_DIR is the variable defines where scanpy installs inis installed.

As scanpy installs anndata automatically as part of its dependencies, you can should see both the of these packages available in $INSTALL_DIR/lib/python3.9/site-packages after successfully running the command above command runs successfully.

Code Block

login-node $ ls $INSTALL_DIR/lib/python3.9/site-packages
anndata                  natsort-8.1.0.dist-info  scanpy                  sinfo-0.3.4.dist-info        tables
anndata-0.8.0.dist-info  numexpr                  scanpy-1.8.2.dist-info  stdlib_list                  tables-3.7.0.dist-info
natsort                  numexpr-2.8.1.dist-info  sinfo                   stdlib_list-0.8.0.dist-info  tables.libs

To try whether and how well it works, run test scanpy, you can try running the test suite that comes with it as below.:

Code Block
git clone -b 1.8.2 https://github.com/scverse/scanpy.git cd scanpy py.test > ../scanpy.test.log

Given the scanpy tests take less than 10 minutes to finish, it is OK complete, you should be able to run these tests on the login node. For more intensive tests, please run in a PBS job.

Info

There are some tests in the scanpy test suite that fail as their RMS values are greater than the expected tolerance. The These failed tests are a good demonstrations demonstration of the risk of risks involved when running applications that are not building built from source on Gadi. Given they As the failed tests are not exceeding the tolerance limit by a large amount, we still will proceed and use this installation in our for the following example.

Download Input Data and Auxiliary Function Code

The example notebook has a short section of code to download that downloads the input data. On Gadi, it needs to run on either the login nodes or the copyq nodes because as Gadi compute nodes have no access to the external networks. Given the dataset is not big we run the download in this example is small, we will download it on the login node directly as the following:

Code Block
login-node $ mkdir -p $WORKDIR login-node $ wget https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/1M_brain_cells_10X.sparse.h5ad ‐P $WORKDIR

where WORKDIR is the variable defines the path to the working directory in which you will run the notebook.

In the The notebook it also calls auxiliary functions defined in anther another file hosted in the same GitHub repository. Download the file to the working directory as below.This file would also need to be downloaded to your working directory:

Code Block
login-node $ wget https://raw.githubusercontent.com/clara-parabricks/rapids-single-cell-examples/5ca2a69d852c2a7ad843b57209e8fdce450336f7/notebooks/rapids_scanpy_funcs.py -P $WORKDIR

Please note, Note that we use the v2021.12.0 version notebook. The This file can will be different in for other branches.

Start an Interactive Job on a GPU Node and Initiate the Dask LocalCUDACluster

When Once the required python packages, input data , and the auxiliary functions all become available on Gadi, it is time to run the code on the we can run our notebook on a GPU node. To get more gain a deeper understanding out of the inner workings of this notebook, it is recommended to run it interactively.

To submit the interactive job, try:

Code Block

login-node $ cd $WORKDIR
login-node $ qsub -I -P${PROJECT} -vINSTALL_DIR=$INSTALL_DIR -qgpuvolta -lwalltime=00:30:00,ncpus=24,ngpus=2,mem=180GB,jobfs=200GB,storage=gdata/dk92+gdata/${PROJECT},other=hyperthread,wd

Please note, it Note that this interactive job assumes your default project $PROJECT has enough SU to support a 2-GPU job for half an hour. If this is not sothe case, please replace $PROJECT with the a project code which gives you the access to compute resource. To learn that has sufficient compute resources. For more information on how to look up resource availability in projects, please read the Gadi User Guide.

It The interactive job also assumes the directory $INSTALL_DIR is located inside /g/data/${PROJECT} and $WORKDIR inside /scratch/$PROJECT where PROJECT is the variable defines the project which code that supports this job. If this is not sothe case, please revise the string passed to the PBS -lstorage directive accordingly, see more details here . More information on PBS directives can be found here.

Once the job is ready, prepare the environment and initiate the dask cluster inside python3. Please note , that when editing PYTHONPATH, the variable INSTALL_DIR that was passed through the PBS job submission line is used. If it this INSTALL_DIR is not accessible from the job, import importing the scanpy and anndata packages would fail.

...

The first necessary modification is shown above. When initiate initiating the LocalCUDACluster, no argument is required as long as it expects workers to run on the same compute node. By running LocalCUDACluster(), it starts a local scheduler ready to connect with the same number of workers as the number of GPUs available inside the job. Since Gadi gpuvolta queue has no more than 4 GPUs in a single node, this method is only valid for all jobs requesting that require no more than 4 GPUs. To learn how to run tasks using more than 4 GPUs on multiple nodes, follow the instructions in Example 2 on this page.

Load Input Data using all the Dask CUDA Workers

In the example notebook, the input is ingested by calling the function read_with_filter defined in the file rapids_scanpy_funcs.py as the following.:

Code Block

python3 >>> input_file = "1M_brain_cells_10X.sparse.h5ad"
python3 >>> min_genes_per_cell = 200
python3 >>> max_genes_per_cell = 6000
python3 >>> dask_sparse_arr, genes, query = rapids_scanpy_funcs.read_with_filter(client,
...                                                        input_file,
...                                                        min_genes_per_cell=min_genes_per_cell,
...                                                        max_genes_per_cell=max_genes_per_cell,
...                                                        partial_post_processor=partial_post_processor)

Inside the This function , it first opens the h5 file to find out the total number of cells, then define the batch of data read by each worker accordingly. The parallel read is initiated by calling dask.array.from_delayed which in turn launches all the individual readreads. Every Every read fetches a piece of data no more than the default batch size of 50000. Given the input file contains 1306127 cell records, all of the first 26 tasks read info of 50000 cells while the 27th read fetches the remaining 6127. By modifying the batch size, the minimal time spent in you can minimise the time spent for data ingestion can be reached. A search result is shown below.. Below is a benchmark showing the batch size and associated normalised read walltime:

batch size	10000	25000	50000	73000	100000	165000	200000
normalised read walltime	1.084	0.990	1	1.006	1.050	1.038	1.114

If data ingestion takes a considerable part amount of time in your production jobs, you might want to go through this optimisation carefully. In general, IO IO operations benefit from bigger batch size on Lustre filesystems as too many small reads and writes can result in very poor performance in terms of both efficiency and robustness. However, when bigger batch size leads to workload imbalance, the overall read walltime increases as well because the entire IO has to wait for the last task to finish. For example, at the , from our benchmark above, a batch size of 200K , there are results in 7 read tasks scheduled on 2 workers in this example, it is and it's very likely that one worker has to wait for the other while it performs the fourth read task.

Even though read_with_filter does more than IO operations, the trend is clear as that time spend spent in IO dominants the overall overhead.

...

In the notebook, it shows there are two PCA decomposition functions shipped with the `cuML` package, see details as shown below`cuML` package:

Code Block
from cuml.decomposition import PCA from cuml.dask.decomposition import PCA

while only Only the second one from cuml.dask.decompostiton package runs on multiple GPUs. You can find similar functions in this section of the cuml API references.

Save Plots to Files

In the notebook, it visualises The example notebook can visualise the cluster result results in plots. Given the current interactive job has no display set, the following modification to save the plots to files is necessary. Once the result is results are saved to a file, you can download the this file to your local PC , open the file to and inspect the result.

Code Block
python3 >>> sc.pl.tsne(adata, color=["kmeans"],show=False,save="_kmeans.pdf")

It The code above tells the scanpy plotting tool not to show the figure , but instead, save it to a pdf file. Once it finishes writing, the this image can be find in the file found under `figures/tsne_kmeans.pdf` inside the your working directory.

Page tree

Versions Compared

Old Version 10

New Version 11

Key

Install Missing Packages

Download Input Data and Auxiliary Function Code

Start an Interactive Job on a GPU Node and Initiate the Dask LocalCUDACluster

Load Input Data using all the Dask CUDA Workers

Save Plots to Files