Project structure of dk92
Project dk92 has been created for data analysis. The project collects software resources and demonstrates best practices for analysing the data collections available at NCI using distributed data analysis tools.
The directory structure is shown below
|
There are two components:
- apps: A set of modules that are updated at regular intervals. This includes the core packages of dask, xarray and jupyter that are required for data analysis as well as many other relevant packages.
- notebooks: An extensive set of example Jupyter notebooks that make use of the NCI data collections. The notebook examples are available on both the /g/data filesystem and Github: https://github.com/NCI-data-analysis-platform
To clone a repository via the command line:
or copy from the appropriate g/data location:
|
The directory tree of example notebooks is shown below
|
Program environments
The NCI-data-analysis module is regularly updated. Software packages are available for Python, R and Julia programming platforms since version 2022.06.
Accessing the module
Resources can be accessed by joining the project dk92 through mancini. Note that no storage or compute resources are provided by project dk92 as it is purely for accessing the software. You will need to use your existing compute NCI project code for computational resources.
Version history
NCI-data-analysis module keeps updating its program versions. It is encourage to access the latest module if applicable. The latest version 2024.01 is singularity image based which supports Python, R and Julia programming languages.
NCI-data-analysis modules | Platform versions |
---|---|
2021.06 | python 3.8 |
2022.06 | Python 3.9; R 4.1.3 ; Julia 1.7.2 |
2023.02 | Python 3.9; R 4.2.2 ; Julia 1.8.2 |
2024.01 | Python 3.10; R 4.3.1 ; Julia 1.9.4 |
2024.05 | Python 3.10; R 4.3.3 ; Julia 1.9.4 |