Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The serial Monto Carlo Pi estimation script is show shown below

Code Block
languagepy
titleSerial Monte Carlo Pi estimation
import math
import random
import time

def sample(num_samples):
    num_inside = 0
    for _ in range(num_samples):
        x, y = random.uniform(-1, 1), random.uniform(-1, 1)
        if math.hypot(x, y) <= 1:
            num_inside += 1
    return num_inside

def approximate_pi(num_samples):
    start = time.time()
    num_inside = sample(num_samples)
    
    print("pi ~= {}".format((4*num_inside)/num_samples))
    print("Finished in: {:.2f}s".format(time.time()-start))
approximate_pi(200000000)

The above script utilises 1 CPU core to run and it takes about 96.31s elapsed time to complete:

pi ~= 3.14156054
Finished in: 96.31s

...

We can parallelize the above serial script with multiprocessing.Pool to make it utilising the full compute resources of a single node as below node ( run within a PBS job requesting 1 node): 

Code Block
languagepy
titleParallel on single node using multiprocessing.Pool
import math
import random
import time

def sample(num_samples):
    num_inside = 0
    for _ in range(num_samples):
        x, y = random.uniform(-1, 1), random.uniform(-1, 1)
        if math.hypot(x, y) <= 1:
            num_inside += 1
    return num_inside

def approximate_pi_parallel(num_samples):
    from multiprocessing.pool import Pool
    pool = Pool()
    
    start = time.time()
    num_inside = 0
    sample_batch_size = 100000
    for result in pool.map(sample, [sample_batch_size for _ in range(num_samples//sample_batch_size)]):
        num_inside += result
        
    print("pi ~= {}".format((4*num_inside)/num_samples))
    print("Finished in: {:.2f}s".format(time.time()-start))
approximate_pi_parallel(200000000)

It takes about 2.78s to complete to run in a single node (Gadi Normal "normal" queue node)

pi ~= 3.14166092
Finished in: 2.78s

The speedup relative to the serial script is about 34.64 with the efficiency of 72% from a serial run to a single node parallel compute computing (48 cores).

Distributed computing with Ray

To extend the above multiprocessing script across multiple nodes, we you need 2 steps of work:.

Step 1: Set up a pre-defined Ray cluster 

...

Simply replace "from multiprocessing.pool import Pool" in the above script with "from ray.util.multiprocessing.pool import Pool" as shown below.

Then we you can connect the Pool to the pre-defined Ray cluster via the argument " ray_address="auto"".

Code Block
languagepy
titleDistributed across multiple nodes with ray.util.multiprocessing.pool
import math
import random
import time

def sample(num_samples):
    num_inside = 0
    for _ in range(num_samples):
        x, y = random.uniform(-1, 1), random.uniform(-1, 1)
        if math.hypot(x, y) <= 1:
            num_inside += 1
    return num_inside

def approximate_pi_distributed(num_samples):
    from ray.util.multiprocessing.pool import Pool # NOTE: Only the import statement is changed.
    pool = Pool(ray_address="auto")
        
    start = time.time()
    num_inside = 0
    sample_batch_size = 100000
    for result in pool.map(sample, [sample_batch_size for _ in range(num_samples//sample_batch_size)]):
        num_inside += result
        
    print("pi ~= {}".format((4*num_inside)/num_samples))
    print("Finished in: {:.2f}s".format(time.time()-start))


approximate_pi_distributed(200000000)

...