You might need to install customise your environment for several reasons including:
...
can install additional Python packages in a custom directory and make it working together with the NCI-data-analysis
...
You can install additional Python packages under their own working directory, or in your /g/data project space if it is permitted. Please note, the Pangeo environment should always be loaded before adding other modules or installing new packages.module.
Step 1:
...
Load the NCI-data-analysis module in your shell environment
To enable the Pangeo environment, you can use the following command within jobs or within an interactive environment:
Code Block | ||
---|---|---|
| ||
$ module loaduse pangeo/2021.01 Loading pangeo/2021.01 Loading requirement: intel-mkl/2019.3.199 python3/3.7.4 hdf5/1.10.5 netcdf/4.7.3/g/data/dk92/apps/Modules/modulefiles $ module load NCI-data-analysis/22.05 |
Step 2: Install
...
a python package if it is NOT available in
...
NCI-data-analysis module
There are multiple ways to install Python packages. For example, you could use the pip package manager which is a de facto standard package-management system used to install and manage software packages written in Python (see instruction here: https://packaging.python.org/tutorials/installing-packages/#installing-to-the-user-site ). Another popular way to install packages is through Conda which is an open source package, dependency and environment management system for any language - Python, R, Ruby, Jua, Scala, Java, JavaScript, C/C++, FORTRAN and many more. Or you can build a python library from the source code.
Please note: additional packages should be installed within a user’s own directory
Let’s now install the Deep Graph Library using pip:.
First of all, make sure the Deep Graph Library is not available in the NCI-data-analysis module,
Code Block | ||
---|---|---|
| ||
[abc123@gadijpf777@gadi-login-0109 ~scr_fp0]$ python Python 3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import deepgraph Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'deepgraph' >>> exit() |
Then we can use 'pip' command to install the Deep Graph Library into a custom directory ( i.e. /scratch/fp0/jpf777/EXTRA_PYTHON_LIBS as below). We recommend to try the following flags to build the library. To learn more flags of pip command, please visit here.
Code Block | ||
---|---|---|
| ||
[jpf777@gadi-login-09 scr_fp0]$ pip install -v --no-binary :all: --upgrade-strategy only-if-needed --prefix /scratch/fp0/jpf777/EXTRA_PYTHON_LIBS deepgraph Using pip 22.1 from /opt/conda/lib/python3.9/site-packages/pip (python 3.9) Collecting deepgraph Downloading DeepGraph-0.2.3.tar.gz (149 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.1/149.1 kB 4.9 MB/s eta 0:00:00 Running command python setup.py egg_info /opt/conda/lib/python3.9/site-packages/setuptools/dist.py:767: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead warnings.warn( running egg_info creating /scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info writing /scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/PKG-INFO writing dependency_links to /scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/dependency_links.txt writing requirements to /scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/requires.txt writing top-level names to /scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/top_level.txt writing manifest file '/scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/SOURCES.txt' reading manifest file '/scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/SOURCES.txt' adding license file 'LICENSE.txt' writing manifest file '/scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/SOURCES.txt' Preparing metadata (setup.py) ... done ... writing requirements to DeepGraph.egg-info/requires.txt writing top-level names to DeepGraph.egg-info/top_level.txt reading manifest file 'DeepGraph.egg-info/SOURCES.txt' adding license file 'LICENSE.txt' writing manifest file 'DeepGraph.egg-info/SOURCES.txt' Copying DeepGraph.egg-info to /scratch/fp0/jpf777/EXTRA_PYTHON_LIBS/lib/python3.9/site-packages/DeepGraph-0.2.3-py3.9.egg-info running install_scripts writing list of installed files to '/scratch/fp0/jpf777/tmp/pip-record-6nwb_evm/install-record.txt' Running setup.py install for deepgraph ... done pip install --user deepgraph Collecting deepgraph Using cached https://files.pythonhosted.org/packages/fc/3e/4a34a5316a5f886b8d7a6787c24852d9e5a5ef00b4ec6af0736f681a3a58/DeepGraph-0.2.2.tar.gz Requirement already satisfied: numpy>=1.6 in /apps/python3/3.7.4/lib/python3.7/site-packages/numpy-1.17.2-py3.7-linux-x86_64.egg (from deepgraph) (1.17.2) Requirement already satisfied: pandas>=0.17.0 in /apps/pangeo/2019.12/lib/python3.7/site-packages (from deepgraph) (0.25.3) Requirement already satisfied: pytz>=2017.2 in /apps/pangeo/2019.12/lib/python3.7/site-packages (from pandas>=0.17.0->deepgraph) (2019.3) Requirement already satisfied: python-dateutil>=2.6.1 in /apps/python3/3.7.4/lib/python3.7/site-packages (from pandas>=0.17.0->deepgraph) (2.8.1) Requirement already satisfied: six>=1.5 in /apps/python3/3.7.4/lib/python3.7/site-packages (from python-dateutil>=2.6.1->pandas>=0.17.0->deepgraph) (1.13.0) Building wheels for collected packages: deepgraph Building wheel for deepgraph (setup.py) ... done Created wheel for deepgraph: filename=DeepGraph-0.2.2-cp37-cp37m-linux_x86_64.whl size=373893 sha256=53e6966cdd833e99af226dd925f9d9f1a10259053cd13f4391caa356bbfedabb Stored in directory: /home/900/nre900/.cache/pip/wheels/7f/4b/45/caf95420067f7a1795c5664bce0beda747d0ce931c2424c5ff Successfully built deepgraph Installing collected packages: deepgraph Successfully installed deepgraph-0.2.23 |
Step 3: Validate new installation
You need to add the custom location to the environment variable PYTHONPATH to use its python libraries together with the NCI-data-analysis module. As show below, the deepgraph comes from the custom location while the dask module still comes from the NCI-data-analysis module itself.
Code Block | ||
---|---|---|
| ||
[abc123@gadijpf777@gadi-login-09 scr_fp0]$ export PYTHONPATH=/scratch/fp0/jpf777/EXTRA_PYTHON_LIBS/lib/python3.9/site-packages [jpf777@gadi-login-0109 ~scr_fp0]$ python Python 3.7.4 (default, Nov 6 2019, 12:34:08) [GCC 8.2.1 20180905 (Red Hat 8.2.1-3)9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import deepgraph >>> deepgraph.__file__ '/scratch/fp0/jpf777/EXTRA_PYTHON_LIBS/lib/python3.9/site-packages/deepgraph/__init__.py' >>> import deepgraph dask >>> dask.__file__ '/opt/conda/lib/python3.9/site-packages/dask/__init__.py' >>> exit() |
Step 4: Add PYTHONPATH to your job script
If you want to add the Python packages installed in your own space to your job script, you will need to add in the PYTHONPATH which points to where you installed these packages:
Code Block | ||
---|---|---|
| ||
#!/bin/bash #PBS -N pangeo_python-test #PBS -P <project code> #PBS -q normal #PBS -l walltime=5:00:00 #PBS -l ncpus=96 #PBS -l mem=384GB #PBS -l jobfs=100GB #PBS -l storage=scratchgdata/z00dk92+scratch/<project code>+gdata/<project code> #PBS -v PYTHONPATH=<path to where you installed your Python packages> module use /g/data/dk92/apps/Modules/modulefiles module load pangeo/2021.01 module load NCI-data-analysis pangeo.ini.all.sh sleep infinity |
(From Yue's rapids document)
...
NCI-data-analysis |
...
For example, on the login node, to install graph-walker on top of the packages included in rapids/22.02, try the following:
...
module use /g/data/dk92/apps/Modules/modulefiles/
module load rapids/
22.02
INSTALL_DIR=/g/data/$PROJECT/.local/envs/rapids22.02_topups
mkdir -p $INSTALL_DIR
python3 -m pip install -v --no-binary :all: --upgrade-strategy only-
if
-needed --prefix $INSTALL_DIR pybind11==
2.9
.
1
export PYTHONPATH=$INSTALL_DIR/lib/python3.
9
/site-packages:$PYTHONPATH
python3 -m pip install -v --no-binary :all: --upgrade-strategy only-
if
-needed --prefix $INSTALL_DIR graph-walker==
1.0
.
6
Note that the package pybind11 is required by graph-walker but is not included in its build process. Therefore, we needed to install pybind11 manually before building graph-walker.
Unfortunately, not all python packages have the install script ready through PyPI to facilitate the building from source option. For example, ray doesn't support it:
|
In this scenario, we have to drop the `--no-binary :all:` option, and allow the installation to pull in binaries and libraries built elsewhere:
|
...
/22.05
RUN_YOUR_OWN_PYTHON_SCRIPT |