There are exists a lot plethora of online materials material related to followRAPIDS. For One such example , in is this blog blog [https://developer.nvidia.com/blog/accelerating-single-cell-genomic-analysis-using-rapids/] , the authors release which showcases accelerating single cell genomic analysis using RAPIDS. The authors of this blog released several notebooks to showcase their work . The and their code is expected to run as it is on Gadi with some modifications. Here we take For the following example, we will look at the notebook [https://github.com/clara-parabricks/rapids-single-cell-examples/blob/v2021.12.0/notebooks/1M_brain_gpu_analysis_multigpu.ipynb] as an example to and show how to modify our RAPIDS environment and run it this on Gadi and what the modifications can be.
Install Missing Packages
Most of the packages imported in this notebook are available in rapids/2022.02. There are only two missing packages, scanpy and anndata. Follow Please follow the instructions provided in the section `Work on this page under `Work with Other Python Packages` on this page to understand Packages` to gain an understanding of how to install topup additional packages. On the same page, we also show how to prepare the environment to use RAPIDS.
For this example, we can use pip to install scanpy:In this example the pip install command can be
Code Block |
---|
login-node $ python3 -m pip install -v --upgrade-strategy only-if-needed --prefix $INSTALL_DIR scanpy |
where INSTALL_DIR is the variable defines where scanpy installs inis installed.
As scanpy installs anndata automatically as part of its dependencies, you can should see both the of these packages available in $INSTALL_DIR/lib/python3.9/site-packages after successfully running the command above command runs successfully.
Code Block |
---|
login-node $ ls $INSTALL_DIR/lib/python3.9/site-packages anndata natsort-8.1.0.dist-info scanpy sinfo-0.3.4.dist-info tables anndata-0.8.0.dist-info numexpr scanpy-1.8.2.dist-info stdlib_list tables-3.7.0.dist-info natsort numexpr-2.8.1.dist-info sinfo stdlib_list-0.8.0.dist-info tables.libs |
To try whether and how well it works, run test scanpy, you can try running the test suite that comes with it as below.:
Code Block |
---|
git clone -b 1.8.2 https://github.com/scverse/scanpy.git cd scanpy py.test > ../scanpy.test.log |
Given the scanpy tests take less than 10 minutes to finish, it is OK complete, you should be able to run these tests on the login node. For more intensive tests, please run in a PBS job.
Info |
---|
There are some tests in the scanpy test suite that fail as their RMS values are greater than the expected tolerance. The These failed tests are a good demonstrations demonstration of the risk of risks involved when running applications that are not building built from source on Gadi. Given they As the failed tests are not exceeding the tolerance limit by a large amount, we still will proceed and use this installation in our for the following example. |
Download Input Data and Auxiliary Function Code
The example notebook has a short section of code to download that downloads the input data. On Gadi, it needs to run on either the login nodes or the copyq nodes because as Gadi compute nodes have no access to the external networks. Given the dataset is not big we run the download in this example is small, we will download it on the login node directly as the following:
Code Block |
---|
login-node $ mkdir -p $WORKDIR login-node $ wget https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/1M_brain_cells_10X.sparse.h5ad ‐P $WORKDIR |
where WORKDIR is the variable defines the path to the working directory in which you will run the notebook.
In the The notebook it also calls auxiliary functions defined in anther another file hosted in the same GitHub repository. Download the file to the working directory as below.This file would also need to be downloaded to your working directory:
Code Block |
---|
login-node $ wget https://raw.githubusercontent.com/clara-parabricks/rapids-single-cell-examples/5ca2a69d852c2a7ad843b57209e8fdce450336f7/notebooks/rapids_scanpy_funcs.py -P $WORKDIR |
Please note, Note that we use the v2021.12.0 version notebook. The This file can will be different in for other branches.
Start an Interactive Job on a GPU Node and Initiate the Dask LocalCUDACluster
When Once the required python packages, input data , and the auxiliary functions all become available on Gadi, it is time to run the code on the we can run our notebook on a GPU node. To get more gain a deeper understanding out of the inner workings of this notebook, it is recommended to run it interactively.
To submit the interactive job, try:
Code Block |
---|
login-node $ cd $WORKDIR login-node $ qsub -I -P${PROJECT} -vINSTALL_DIR=$INSTALL_DIR -qgpuvolta -lwalltime=00:30:00,ncpus=24,ngpus=2,mem=180GB,jobfs=200GB,storage=gdata/dk92+gdata/${PROJECT},other=hyperthread,wd |
Please note, it Note that this interactive job assumes your default project $PROJECT has enough SU to support a 2-GPU job for half an hour. If this is not sothe case, please replace $PROJECT with the a project code which gives you the access to compute resource. To learn that has sufficient compute resources. For more information on how to look up resource availability in projects, please read the Gadi User Guide.
It The interactive job also assumes the directory $INSTALL_DIR is located inside /g/data/${PROJECT} and $WORKDIR inside /scratch/$PROJECT where PROJECT is the variable defines the project which code that supports this job. If this is not sothe case, please revise the string passed to the PBS -lstorage directive accordingly, see more details here . More information on PBS directives can be found here.
Once the job is ready, prepare the environment and initiate the dask cluster inside python3. Please note , that when editing PYTHONPATH, the variable INSTALL_DIR that was passed through the PBS job submission line is used. If it this INSTALL_DIR is not accessible from the job, import importing the scanpy and anndata packages would fail.
...
The first necessary modification is shown above. When initiate initiating the LocalCUDACluster, no argument is required as long as it expects workers to run on the same compute node. By running LocalCUDACluster(), it starts a local scheduler ready to connect with the same number of workers as the number of GPUs available inside the job. Since Gadi gpuvolta queue has no more than 4 GPUs in a single node, this method is only valid for all jobs requesting that require no more than 4 GPUs. To learn how to run tasks using more than 4 GPUs on multiple nodes, follow the instructions in Example 2 on this page.
Load Input Data using all the Dask CUDA Workers
In the example notebook, the input is ingested by calling the function read_with_filter defined in the file rapids_scanpy_funcs.py as the following.:
Code Block |
---|
python3 >>> input_file = "1M_brain_cells_10X.sparse.h5ad" python3 >>> min_genes_per_cell = 200 python3 >>> max_genes_per_cell = 6000 python3 >>> dask_sparse_arr, genes, query = rapids_scanpy_funcs.read_with_filter(client, ... input_file, ... min_genes_per_cell=min_genes_per_cell, ... max_genes_per_cell=max_genes_per_cell, ... partial_post_processor=partial_post_processor) |
Inside the This function , it first opens the h5 file to find out the total number of cells, then define the batch of data read by each worker accordingly. The parallel read is initiated by calling dask.array.from_delayed which in turn launches all the individual readreads. Every Every read fetches a piece of data no more than the default batch size of 50000. Given the input file contains 1306127 cell records, all of the first 26 tasks read info of 50000 cells while the 27th read fetches the remaining 6127. By modifying the batch size, the minimal time spent in you can minimise the time spent for data ingestion can be reached. A search result is shown below.. Below is a benchmark showing the batch size and associated normalised read walltime:
batch size | 10000 | 25000 | 50000 | 73000 | 100000 | 165000 | 200000 |
---|---|---|---|---|---|---|---|
normalised read walltime | 1.084 | 0.990 | 1 | 1.006 | 1.050 | 1.038 | 1.114 |
If data ingestion takes a considerable part amount of time in your production jobs, you might want to go through this optimisation carefully. In general, IO IO operations benefit from bigger batch size on Lustre filesystems as too many small reads and writes can result in very poor performance in terms of both efficiency and robustness. However, when bigger batch size leads to workload imbalance, the overall read walltime increases as well because the entire IO has to wait for the last task to finish. For example, at the , from our benchmark above, a batch size of 200K , there are results in 7 read tasks scheduled on 2 workers in this example, it is and it's very likely that one worker has to wait for the other while it performs the fourth read task.
Even though read_with_filter does more than IO operations, the trend is clear as that time spend spent in IO dominants the overall overhead.
...
In the notebook, it shows there are two PCA decomposition functions shipped with the `cuML` package, see details as shown below`cuML` package:
Code Block |
---|
from cuml.decomposition import PCA from cuml.dask.decomposition import PCA |
while only Only the second one from cuml.dask.decompostiton package runs on multiple GPUs. You can find similar functions in this section of the cuml API references.
Save Plots to Files
In the notebook, it visualises The example notebook can visualise the cluster result results in plots. Given the current interactive job has no display set, the following modification to save the plots to files is necessary. Once the result is results are saved to a file, you can download the this file to your local PC , open the file to and inspect the result.
Code Block |
---|
python3 >>> sc.pl.tsne(adata, color=["kmeans"],show=False,save="_kmeans.pdf") |
It The code above tells the scanpy plotting tool not to show the figure , but instead, save it to a pdf file. Once it finishes writing, the this image can be find in the file found under `figures/tsne_kmeans.pdf` inside the your working directory.