Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

You might need to install customise your environment for several reasons including:

...

can install additional Python packages in a custom directory and make it working together with the NCI-data-analysis

...

You can install additional Python packages under their own working directory, or in your /g/data project space if it is permitted. Please note, the Pangeo environment should always be loaded before adding other modules or installing new packages.module. 

Step 1:

...

Load the NCI-data-analysis module in your shell environment

To enable the Pangeo environment, you can use the following command within jobs or within an interactive environment:

Code Block
languagebash
$ module loaduse pangeo/2021.01
Loading pangeo/2021.01
  Loading requirement: intel-mkl/2019.3.199 python3/3.7.4 hdf5/1.10.5
    netcdf/4.7.3/g/data/dk92/apps/Modules/modulefiles
$ module load NCI-data-analysis/22.05

Step 2: Install

...

a python package if it is NOT available in

...

NCI-data-analysis module

There are multiple ways to install Python packages. For example, you could use the pip package manager which is a de facto standard package-management system used to install and manage software packages written in Python (see instruction here: https://packaging.python.org/tutorials/installing-packages/#installing-to-the-user-site ). Another popular way to install packages is through Conda which is an open source package, dependency and environment management system for any language - Python, R, Ruby, Jua, Scala, Java, JavaScript, C/C++, FORTRAN and many more. Or you can build a python library from the source code.  

Please note: additional packages should be installed within a user’s own directory

Let’s now install the Deep Graph Library using pip:.


First of all, make sure the Deep Graph Library is not available in the NCI-data-analysis module,

Code Block
languagebash
[abc123@gadijpf777@gadi-login-0109 ~scr_fp0]$ python
Python 3.9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:24:11)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import deepgraph
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'deepgraph'
>>> exit()

Then we can use 'pip' command to install the Deep Graph Library into a custom directory ( i.e. /scratch/fp0/jpf777/EXTRA_PYTHON_LIBS as below). We recommend to try the following flags to build the library. To learn more flags of pip command, please visit here.

Code Block
languagebash
[jpf777@gadi-login-09 scr_fp0]$ pip install -v --no-binary :all: --upgrade-strategy only-if-needed --prefix /scratch/fp0/jpf777/EXTRA_PYTHON_LIBS deepgraph
Using pip 22.1 from /opt/conda/lib/python3.9/site-packages/pip (python 3.9)
Collecting deepgraph
Downloading DeepGraph-0.2.3.tar.gz (149 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.1/149.1 kB 4.9 MB/s eta 0:00:00
Running command python setup.py egg_info
/opt/conda/lib/python3.9/site-packages/setuptools/dist.py:767: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
warnings.warn(
running egg_info
creating /scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info
writing /scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/PKG-INFO
writing dependency_links to /scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/dependency_links.txt
writing requirements to /scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/requires.txt
writing top-level names to /scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/top_level.txt
writing manifest file '/scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/SOURCES.txt'
reading manifest file '/scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/SOURCES.txt'
adding license file 'LICENSE.txt'
writing manifest file '/scratch/fp0/jpf777/tmp/pip-pip-egg-info-0fwljpqv/DeepGraph.egg-info/SOURCES.txt'
Preparing metadata (setup.py) ... done
...
writing requirements to DeepGraph.egg-info/requires.txt
writing top-level names to DeepGraph.egg-info/top_level.txt
reading manifest file 'DeepGraph.egg-info/SOURCES.txt'
adding license file 'LICENSE.txt'
writing manifest file 'DeepGraph.egg-info/SOURCES.txt'
Copying DeepGraph.egg-info to /scratch/fp0/jpf777/EXTRA_PYTHON_LIBS/lib/python3.9/site-packages/DeepGraph-0.2.3-py3.9.egg-info
running install_scripts
writing list of installed files to '/scratch/fp0/jpf777/tmp/pip-record-6nwb_evm/install-record.txt'
Running setup.py install for deepgraph ... done pip install --user deepgraph

Collecting deepgraph
  Using cached https://files.pythonhosted.org/packages/fc/3e/4a34a5316a5f886b8d7a6787c24852d9e5a5ef00b4ec6af0736f681a3a58/DeepGraph-0.2.2.tar.gz
Requirement already satisfied: numpy>=1.6 in /apps/python3/3.7.4/lib/python3.7/site-packages/numpy-1.17.2-py3.7-linux-x86_64.egg (from deepgraph) (1.17.2)
Requirement already satisfied: pandas>=0.17.0 in /apps/pangeo/2019.12/lib/python3.7/site-packages (from deepgraph) (0.25.3)
Requirement already satisfied: pytz>=2017.2 in /apps/pangeo/2019.12/lib/python3.7/site-packages (from pandas>=0.17.0->deepgraph) (2019.3)
Requirement already satisfied: python-dateutil>=2.6.1 in /apps/python3/3.7.4/lib/python3.7/site-packages (from pandas>=0.17.0->deepgraph) (2.8.1)
Requirement already satisfied: six>=1.5 in /apps/python3/3.7.4/lib/python3.7/site-packages (from python-dateutil>=2.6.1->pandas>=0.17.0->deepgraph) (1.13.0)
Building wheels for collected packages: deepgraph
  Building wheel for deepgraph (setup.py) ... done
  Created wheel for deepgraph: filename=DeepGraph-0.2.2-cp37-cp37m-linux_x86_64.whl size=373893 sha256=53e6966cdd833e99af226dd925f9d9f1a10259053cd13f4391caa356bbfedabb
  Stored in directory: /home/900/nre900/.cache/pip/wheels/7f/4b/45/caf95420067f7a1795c5664bce0beda747d0ce931c2424c5ff
Successfully built deepgraph
Installing collected packages: deepgraph
Successfully installed deepgraph-0.2.23

Step 3: Validate new installation

You need to add the custom location to the environment variable PYTHONPATH to use its python libraries together with the NCI-data-analysis module. As show below, the deepgraph comes from the custom location while the dask module still comes from the NCI-data-analysis module itself.

Code Block
languagebash
[abc123@gadijpf777@gadi-login-09 scr_fp0]$ export PYTHONPATH=/scratch/fp0/jpf777/EXTRA_PYTHON_LIBS/lib/python3.9/site-packages
[jpf777@gadi-login-0109 ~scr_fp0]$ python 
Python 3.7.4 (default, Nov 6 2019, 12:34:08) [GCC 8.2.1 20180905 (Red Hat 8.2.1-3)9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:24:11)
[GCC 9.4.0] on linux 
Type "help", "copyright", "credits" or "license" for more information.
>>> import deepgraph
>>> deepgraph.__file__
'/scratch/fp0/jpf777/EXTRA_PYTHON_LIBS/lib/python3.9/site-packages/deepgraph/__init__.py'
>>> import deepgraph 
dask
>>> dask.__file__
'/opt/conda/lib/python3.9/site-packages/dask/__init__.py'
>>> exit()


Step 4: Add PYTHONPATH to your job script

If you want to add the Python packages installed in your own space to your job script, you will need to add in the PYTHONPATH which points to where you installed these packages:

Code Block
languagebash
#!/bin/bash
#PBS -N pangeo_python-test
#PBS -P <project code>
#PBS -q normal
#PBS -l walltime=5:00:00
#PBS -l ncpus=96
#PBS -l mem=384GB
#PBS -l jobfs=100GB
#PBS -l storage=scratchgdata/z00dk92+scratch/<project code>+gdata/<project code>
#PBS -v PYTHONPATH=<path to where you installed your Python packages> 

module use /g/data/dk92/apps/Modules/modulefiles
module load pangeo/2021.01
module load NCI-data-analysis
pangeo.ini.all.sh
sleep infinity

(From Yue's rapids document)

...

NCI-data-analysis

...

For example, on the login node, to install graph-walker on top of the packages included in rapids/22.02, try the following:

...

module use /g/data/dk92/apps/Modules/modulefiles/
module load rapids/22.02
INSTALL_DIR=/g/data/$PROJECT/.local/envs/rapids22.02_topups
mkdir -p $INSTALL_DIR
python3 -m pip install -v --no-binary :all: --upgrade-strategy only-if-needed --prefix $INSTALL_DIR pybind11==2.9.1
export PYTHONPATH=$INSTALL_DIR/lib/python3.9/site-packages:$PYTHONPATH
python3 -m pip install -v --no-binary :all: --upgrade-strategy only-if-needed --prefix $INSTALL_DIR graph-walker==1.0.6

Note that the package pybind11 is required by graph-walker but is not included in its build process. Therefore, we needed to install pybind11 manually before building graph-walker.

Unfortunately, not all python packages have the install script ready through PyPI to facilitate the building from source option. For example, ray doesn't support it:

$ python3 -m pip install -v --no-binary :all: --upgrade-strategy only-if-needed --prefix $INSTALL_DIR ray
Using pip 22.0.3 from /opt/conda/envs/rapids/lib/python3.9/site-packages/pip (python 3.9)
ERROR: Could not find a version that satisfies the requirement ray (from versions: none)
ERROR: No matching distribution found for ray

In this scenario, we have to drop the `--no-binary :all:` option, and allow the installation to pull in binaries and libraries built elsewhere:

$ python3 -m pip install -v --upgrade-strategy only-if-needed --prefix $INSTALL_DIR ray==1.11.0
$ find $INSTALL_DIR/lib/python3.9/site-packages/ray -type f | grep "\.so" | awk -F"/" '{print $NF}'
_raylet.so
setproctitle.cpython-39-x86_64-linux-gnu.so
_psutil_linux.cpython-39-x86_64-linux-gnu.so
_psutil_posix.cpython-39-x86_64-linux-gnu.so

...

/22.05
RUN_YOUR_OWN_PYTHON_SCRIPT