Page tree

Julia implements distributed memory parallel computing in its standard library `Distributed`. You can start and manage worker processes when calling julia.

Within a Single Node

If the job requests no more than a full node, a local cluster is enough

julia -p $PBS_NCPUS pmap.example.jl

Once Julia starts, all the workers is ready to use. 

pmap.example.jl
pmap(x->getpid(),1:nworkers())

Equivalently, the function `addprocs` starts the same cluster in Julia script as shown below. In side a ARE jupyter notebook, the following code add $PBS_NCPUS workers to the local cluster and collect their process ID to the master process.

start.local.cluster.jl
using Distributed
addprocs(parse(Int64,ENV["PBS_NCPUS"]))
pmap(x->getpid(),1:nworkers())

   

Across Multiple Nodes

If it is a multi-node job, start the cluster as the following. It uses passwordless ssh login to start Julia worker processes across all nodes available in the job.  The following example works through both the command line interface on Gadi node and the graphic user interface in an ARE JupyterLab session.

using Distributed
home=ENV["HOME"]
nodes=unique(split(read(open(ENV["PBS_NODEFILE"],"r"),String)))
ncpus_per_node = parse(Int64,ENV["PBS_NCI_NCPUS_PER_NODE"])
machines=[(split(node,".gadi.")[1],ncpus_per_node) for node in nodes]
exename=joinpath(ENV["NCI_DATA_ANALYSIS_BASE"],"bin","julia")
addprocs(machines;tunnel=true,sshflags=`-o PubKeyAuthentication=yes -o StrictHostKeyChecking=no -o IdentityFile=$home/.ssh/juliakey`,exename=exename)

where `$HOME/.ssh/juliakey` is generated by ssh-keygen on Gadi and the first line in the file is "-----BEGIN OPENSSH PRIVATE KEY-----".

As a supplement of the native parallelism provided by Julia standard libraries we have included more from JuliaParallel organisation in the module `NCI-data-analysis` to support parallel computing in Julia.

To use them at NCI, we show several examples here.

MPI.jl

  • No labels