...
The serial Monto Carlo Pi estimation script is show shown below
Code Block | ||||
---|---|---|---|---|
| ||||
import math import random import time def sample(num_samples): num_inside = 0 for _ in range(num_samples): x, y = random.uniform(-1, 1), random.uniform(-1, 1) if math.hypot(x, y) <= 1: num_inside += 1 return num_inside def approximate_pi(num_samples): start = time.time() num_inside = sample(num_samples) print("pi ~= {}".format((4*num_inside)/num_samples)) print("Finished in: {:.2f}s".format(time.time()-start)) approximate_pi(200000000) |
The above script utilises 1 CPU core to run and it takes about 96.31s elapsed time to complete:
pi ~= 3.14156054 Finished in: 96.31s |
...
We can parallelize the above serial script with multiprocessing.Pool to make it utilising the full compute resources of a single node as below node ( run within a PBS job requesting 1 node):
Code Block | ||||
---|---|---|---|---|
| ||||
import math import random import time def sample(num_samples): num_inside = 0 for _ in range(num_samples): x, y = random.uniform(-1, 1), random.uniform(-1, 1) if math.hypot(x, y) <= 1: num_inside += 1 return num_inside def approximate_pi_parallel(num_samples): from multiprocessing.pool import Pool pool = Pool() start = time.time() num_inside = 0 sample_batch_size = 100000 for result in pool.map(sample, [sample_batch_size for _ in range(num_samples//sample_batch_size)]): num_inside += result print("pi ~= {}".format((4*num_inside)/num_samples)) print("Finished in: {:.2f}s".format(time.time()-start)) approximate_pi_parallel(200000000) |
It takes about 2.78s to complete to run in a single node (Gadi Normal "normal" queue node).
pi ~= 3.14166092 Finished in: 2.78s |
The speedup relative to the serial script is about 34.64 with the efficiency of 72% from a serial run to a single node parallel compute computing (48 cores).
Distributed computing with Ray
To extend the above multiprocessing script across multiple nodes, we you need 2 steps of work:.
Step 1: Set up a pre-defined Ray cluster
...
Simply replace "from multiprocessing.pool import Pool" in the above script with "from ray.util.multiprocessing.pool import Pool" as shown below.
Then we you can connect the Pool to the pre-defined Ray cluster via the argument " ray_address="auto"".
Code Block | ||||
---|---|---|---|---|
| ||||
import math import random import time def sample(num_samples): num_inside = 0 for _ in range(num_samples): x, y = random.uniform(-1, 1), random.uniform(-1, 1) if math.hypot(x, y) <= 1: num_inside += 1 return num_inside def approximate_pi_distributed(num_samples): from ray.util.multiprocessing.pool import Pool # NOTE: Only the import statement is changed. pool = Pool(ray_address="auto") start = time.time() num_inside = 0 sample_batch_size = 100000 for result in pool.map(sample, [sample_batch_size for _ in range(num_samples//sample_batch_size)]): num_inside += result print("pi ~= {}".format((4*num_inside)/num_samples)) print("Finished in: {:.2f}s".format(time.time()-start)) approximate_pi_distributed(200000000) |
...