Page tree

RAPIDS is an open source project supported by NVIDIA. It aims to upgrade existing python data science tool-chains to execute data analytics pipelines entirely on GPUs and if necessary, even on GPUs across multiple nodes with minimal code change. RAPIDS utilises NVIDIA CUDA primitives for low-level compute optimisation, and exposes GPU parallelism and high-bandwidth memory through user-friendly Python interfaces.

We maintain a RAPIDS module at NCI within the project dk92 to facilitate the user community and empower data analytics in their research. This RAPIDS module contains out-of-box libraries that are ready to run on GPUs available on Gadi. 

The following core RAPIDS libraries are available in our module:

  • cuDF, a pandas-like dataframe manipulation library
  • cuML, a collection of ML libraries that implement GPU versions of algorithms available in sklearn
  • cuGraph, a NetworkX-like graph analytics library

These libraries manage parallelism through the dynamic task scheduler dask-scheduler. After loading input data into collection abstractions such as dataFrames and constructing and executing a task graph, the dask scheduler coordinates the actions of all the dask-cuda-worker processes across multiple nodes to perform the data transfer, communication and compute tasks on GPUs while maintaining minimal latency.

Our RAPIDS module contains not only the libraries maintained in the GitHub rapidsai repositories, but also libraries integrated in the existing data analytics pipeline, from popular data analytics CPU libraries such as pandas and scikit-learn, to visualisation libraries such as matplotlib, seaborn, and bokeh. We also have a JupyterLab environment with dask dashboard and NVDashboard extensions for monitoring resource utilisation. 

Please join the project dk92 before using the RAPIDS module. Note that once your project membership request is submitted, it may take some time for the Lead CI to approve it. Once approved, please allow another 30-60 minutes in order for the new membership to be synchronised through all systems before you start testing this module.

  • No labels