Introduction
Our NCI AI/ML environment under project dk92 has been developed to allow Machine Learning analysis and processing by utilising Gadi GPU resources. The module contains popular machine learning packages including tensorflow-gpu and pytorch-gpu together with general purposed data processing packages such as xarray, dask and ray. You can utilise multiple GPU nodes to run Tensorflow or Pytorch based projects via parallelisation frameworks like Horovod and Ray. You can access our NCI AL/ML environment by loading the 'NCI-ai-ml' module under project dk92.
The AI/ML packages are available in two ways on Gadi: natively compiled standalone packages under /apps, or using the NCI-ai-ml module described here. In our module we have bundled together the major data science and machine learning packages (including Pytorch, Tensorflow, and Horovod) and configured them so that they are interoperable. The standalone packages can be difficult to use when trying multiple technologies. Our experiments show that the performance difference between the two methods is less than 5% in most cases.
Accessing the module
Resources can be accessed by joining the project dk92. Note that no storage or compute resources are provided by project dk92 as it is purely for accessing the software. You will need to use your existing compute NCI project code for computational resources.
Package components
The NCI-ai-ml environment is regularly updated and the latest version is 24.08. Software packages are available for Python and the major ones are listed as below
Key Packages | NCI-ai-ml env | ||||
---|---|---|---|---|---|
23.03 | 23.10 | 24.05 | 24.08 | 24.11 | |
cuda-toolkit | 11.7.0 | 11.8.0 | 11.8.0 | 11.8.0 | 11.8.0 |
dask | 2023.1.0 | 2023.11.0 | 2024.5.2 | 2024.8.0 | 2024.8.1 |
jupyterlab | 3.5.3 | 3.5.3 | 3.5.3 | 3.6.7 | 3.6.7 |
keras | 2.11.0 | 2.13.1 | 2.15.0 | 2.15.0 | 3.6.0 |
ray | 2.3.1 | 2.8.0 | 2.24.0 | 2.34.0 | 2.38.0 |
tensorflow-gpu | 2.11.0 | 2.13.0 | 2.15.1 | 2.15.1 | 2.16.2 |
pytorch | 1.13.1 | 2.0.1 | 2.3.0 | 2.3.1 | 2.5.1 |
pytorch-lightning | 2.0.1 | 2.1.1 | 2.2.2 | 2.4.0 | 2.4.0 |
horovod | 0.27.0 | 0.28.1 | 0.28.1 | 0.28.1 | 0.28.1 |
cupy | 12.0.0 | 12.2.0 | 13.1.0 | 13.2.0 | 13.3.0 |
jax | 0.4.20 | 0.4.28 | 0.4.31 | 0.4.35 | |
onnx | 1.15.0 | 1.14.1 | 1.14.0 | 1.14.0 | |
captum | 0.7.0 |
Parallelisation frameworks
Currently the NCI-ai-ml environment supports the following parallelism ML frameworks:
Version history
NCI-ai-ml modules | Platform versions |
---|---|
22.08 | Python 3.9 |
22.11 | Python 3.9 |
23.03 | Python 3.9 Julia 1.6.7 |
23.10 | Python 3.9 |
24.05 | Python 3.10 |
24.08 | Python 3.10 |
24.11 | Python 3.10 |