Frage zu verteiltem Rechnen - CPU Last verteilen

Wenn du dir nicht sicher bist, in welchem der anderen Foren du die Frage stellen sollst, dann bist du hier im Forum für allgemeine Fragen sicher richtig.
Antworten
Xbash_Zero
User
Beiträge: 30
Registriert: Montag 19. September 2022, 22:48

Moin,

ich habe eine mehr theoretische Frage zum verteilten Rechnen via Cluster, sprich man stelle sich vor, man hat ein sehr rechenintensives Programm und möchte nun die Last auf Knoten bzw. andere PC's im Netzwerk verteilen, welche möglichst einfach zu implementierende Bibliotheken gibt es da?

Bisher habe ich versucht, das mittels MPI for Python zu realisieren, allerdings hat immer nur einer der Knoten gerechnet oder ich bekam eine Fehlermeldung. Ich habe keine Ahnung warum es nicht richtig funtkioniert hat, ich habe mich dabei mit diversen Tutorials rumgeschlagen.


Hier ist der Code, den ich versucht habe, Last verteilt auszuführen, einfacher Monte Carlo Algorithmus

Code: Alles auswählen

import random
import numpy as np
from mpi4py import MPI
import time

def monte_carlo_pi(n_samples):
    count = 0
    for _ in range(n_samples):
        x, y = random.random(), random.random()
        dist = np.sqrt(x**2 + y**2)
        count += dist <= 1
    return 4 * count / n_samples

if __name__ == "__main__":
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()

    total_samples = 100_000_000
    samples_per_process = total_samples // size

    comm.Barrier()

    start_time = time.time()
    local_pi_estimate = monte_carlo_pi(samples_per_process)
    end_time = time.time()

    elapsed_time = end_time - start_time
    print(f"Time taken on process {rank}: {elapsed_time} seconds")

    global_pi_estimate = comm.allreduce(local_pi_estimate, op=MPI.SUM) / size
    print(f"Pi estimate: {global_pi_estimate}")


Das ist der Fehler den ich bekomme: PMIX ERROR: NOT-FOUND in file ../../../../../src/mca/gds/base/gds_base_fns.c at line 181

Code: Alles auswählen

backend-gpu:~/py4mpi$  mpiexec -hostfile hosts.txt -n 16 --display-map python3 monte_carlo_mpi.py --mca plm_base_verbose 30
[server-backend-gpu:139171] mca: base: components_register: registering framework plm components
[server-backend-gpu:139171] mca: base: components_register: found loaded component rsh
[server-backend-gpu:139171] mca: base: components_register: component rsh register function successful
[server-backend-gpu:139171] mca: base: components_register: found loaded component isolated
[server-backend-gpu:139171] mca: base: components_register: component isolated has no register or open function
[server-backend-gpu:139171] mca: base: components_register: found loaded component slurm
[server-backend-gpu:139171] mca: base: components_register: component slurm register function successful
[server-backend-gpu:139171] mca: base: components_open: opening plm components
[server-backend-gpu:139171] mca: base: components_open: found loaded component rsh
[server-backend-gpu:139171] mca: base: components_open: component rsh open function successful
[server-backend-gpu:139171] mca: base: components_open: found loaded component isolated
[server-backend-gpu:139171] mca: base: components_open: component isolated open function successful
[server-backend-gpu:139171] mca: base: components_open: found loaded component slurm
[server-backend-gpu:139171] mca: base: components_open: component slurm open function successful
[server-backend-gpu:139171] mca:base:select: Auto-selecting plm components
[server-backend-gpu:139171] mca:base:select:(  plm) Querying component [rsh]
[server-backend-gpu:139171] mca:base:select:(  plm) Query of component [rsh] set priority to 10
[server-backend-gpu:139171] mca:base:select:(  plm) Querying component [isolated]
[server-backend-gpu:139171] mca:base:select:(  plm) Query of component [isolated] set priority to 0
[server-backend-gpu:139171] mca:base:select:(  plm) Querying component [slurm]
[server-backend-gpu:139171] mca:base:select:(  plm) Selected component [rsh]
[server-backend-gpu:139171] mca: base: close: component isolated closed
[server-backend-gpu:139171] mca: base: close: unloading component isolated
[server-backend-gpu:139171] mca: base: close: component slurm closed
[server-backend-gpu:139171] mca: base: close: unloading component slurm
[server-backend-gpu:139171] [[56867,0],0] plm:rsh: final template argv:
        /usr/bin/ssh <template>  orted -mca ess "env" -mca ess_base_jobid "3726835712" -mca ess_base_vpid "<template>" -mca ess_base_num_procs "3" -mca orte_node_regex "server-backend-gpu,[3:192].168.0.52,[3:192].168.0.24@0(3)" -mca orte_hnp_uri "3726835712.0;tcp://192.168.0.53:47097" -mca plm "rsh" --tree-spawn -mca routed "radix" -mca orte_parent_uri "3726835712.0;tcp://192.168.0.53:47097" -mca plm_base_verbose "30" -mca rmaps_base_display_map "1" -mca pmix "^s1,s2,cray,isolated"
[feldbus:227053] mca: base: components_register: registering framework plm components
[feldbus:227053] mca: base: components_register: found loaded component rsh
[feldbus:227053] mca: base: components_register: component rsh register function successful
[feldbus:227053] mca: base: components_open: opening plm components
[feldbus:227053] mca: base: components_open: found loaded component rsh
[feldbus:227053] mca: base: components_open: component rsh open function successful
[feldbus:227053] mca:base:select: Auto-selecting plm components
[feldbus:227053] mca:base:select:(  plm) Querying component [rsh]
[feldbus:227053] mca:base:select:(  plm) Query of component [rsh] set priority to 10
[feldbus:227053] mca:base:select:(  plm) Selected component [rsh]
[server-backend:09192] mca: base: components_register: registering framework plm components
[server-backend:09192] mca: base: components_register: found loaded component rsh
[server-backend:09192] mca: base: components_register: component rsh register function successful
[server-backend:09192] mca: base: components_open: opening plm components
[server-backend:09192] mca: base: components_open: found loaded component rsh
[server-backend:09192] mca: base: components_open: component rsh open function successful
[server-backend:09192] mca:base:select: Auto-selecting plm components
[server-backend:09192] mca:base:select:(  plm) Querying component [rsh]
[server-backend:09192] mca:base:select:(  plm) Query of component [rsh] set priority to 10
[server-backend:09192] mca:base:select:(  plm) Selected component [rsh]
[server-backend-gpu:139171] [[56867,0],0] complete_setup on job [56867,1]
 Data for JOB [56867,1] offset 0 Total slots allocated 24

 ========================   JOB MAP   ========================

 Data for node: 192.168.0.52    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [56867,1] App: 0 Process rank: 0 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 1 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 2 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 3 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 4 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 5 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 6 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 7 Bound: N/A

 Data for node: 192.168.0.24    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [56867,1] App: 0 Process rank: 8 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 9 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 10 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 11 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 12 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 13 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 14 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 15 Bound: N/A

 =============================================================
 Data for JOB [56867,1] offset 0 Total slots allocated 24

 ========================   JOB MAP   ========================

 Data for node: 192.168.0.52    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [56867,1] App: 0 Process rank: 0 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 1 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 2 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 3 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 4 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 5 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 6 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 7 Bound: N/A

 Data for node: 192.168.0.24    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [56867,1] App: 0 Process rank: 8 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 9 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 10 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 11 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 12 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 13 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 14 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 15 Bound: UNBOUND

 =============================================================
 Data for JOB [56867,1] offset 0 Total slots allocated 24

 ========================   JOB MAP   ========================

 Data for node: 192.168.0.52    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [56867,1] App: 0 Process rank: 0 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 1 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 2 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 3 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 4 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 5 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 6 Bound: UNBOUND
        Process OMPI jobid: [56867,1] App: 0 Process rank: 7 Bound: UNBOUND

 Data for node: 192.168.0.24    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [56867,1] App: 0 Process rank: 8 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 9 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 10 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 11 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 12 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 13 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 14 Bound: N/A
        Process OMPI jobid: [56867,1] App: 0 Process rank: 15 Bound: N/A

 =============================================================
[server-backend-gpu:139171] [[56867,0],0] plm:base:receive update proc state command from [[56867,0],2]
[server-backend-gpu:139171] [[56867,0],0] plm:base:receive got update_proc_state for job [56867,1]
[server-backend-gpu:139171] [[56867,0],0] plm:base:receive update proc state command from [[56867,0],1]
[server-backend-gpu:139171] [[56867,0],0] plm:base:receive got update_proc_state for job [56867,1]
[feldbus:227053] PMIX ERROR: NOT-FOUND in file ../../../../../src/mca/gds/base/gds_base_fns.c at line 181
[feldbus:227053] PMIX ERROR: NOT-FOUND in file ../../../../../../src/mca/common/dstore/dstore_base.c at line 2571
[feldbus:227053] PMIX ERROR: NOT-FOUND in file ../../../src/server/pmix_server.c at line 2462
[server-backend:09192] *** Process received signal ***
[server-backend:09192] Signal: Segmentation fault (11)
[server-backend:09192] Signal code: Address not mapped (1)
[server-backend:09192] Failing at address: (nil)
[server-backend:09192] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f342bbb3520]
[server-backend:09192] [ 1] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_bfrops_base_pack_value+0x4b)[0x7f34292a1fdb]
[server-backend:09192] [ 2] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_bfrops_base_pack_kval+0x8f)[0x7f342929fe3f]
[server-backend:09192] [ 3] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_bfrops_base_pack+0x7f)[0x7f34292a2d6f]
[server-backend:09192] [ 4] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_common_dstor_store+0x2c5)[0x7f342929d515]
[server-backend:09192] [ 5] /lib/x86_64-linux-gnu/libpmix.so.2(+0x9f9dc)[0x7f34292699dc]
[server-backend:09192] [ 6] /lib/x86_64-linux-gnu/libevent_core-2.1.so.7(+0x1dee8)[0x7f342ba2cee8]
[server-backend:09192] [ 7] /lib/x86_64-linux-gnu/libevent_core-2.1.so.7(event_base_loop+0x577)[0x7f342ba2ebf7]
[server-backend:09192] [ 8] /lib/x86_64-linux-gnu/libpmix.so.2(+0x9c406)[0x7f3429266406]
[server-backend:09192] [ 9] /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7f342bc05b43]
[server-backend:09192] [10] /lib/x86_64-linux-gnu/libc.so.6(+0x126a00)[0x7f342bc97a00]
[server-backend:09192] *** End of error message ***
--------------------------------------------------------------------------
ORTE has lost communication with a remote daemon.

  HNP daemon   : [[56867,0],0] on node server-backend-gpu
  Remote daemon: [[56867,0],1] on node 192.168.0.52

This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
--------------------------------------------------------------------------
[feldbus:227053] mca: base: close: component rsh closed
[feldbus:227053] mca: base: close: unloading component rsh
[server-backend-gpu:139171] mca: base: close: component rsh closed
[server-backend-gpu:139171] mca: base: close: unloading component rsh
Die hosts.txt

Code: Alles auswählen

192.168.0.52 slots=8
192.168.0.24 slots=8
Mh, ich verstehe nicht ehrlichgesagt, was diese Fehlermeldung bedeutet, wenn ich versuche mit weniger Slots auszuführen, dann läuft es aber nur auf einem Knoten...

Code: Alles auswählen

 mpiexec -hostfile hosts.txt -n 4 python3 monte_carlo_mpi.py

Vielleicht kann jemand helfen oder mir einen anderen Weg aufzeigen, würde mich freuen, Danke im Voraus!

Grüße
__deets__
User
Beiträge: 14545
Registriert: Mittwoch 14. Oktober 2015, 14:29

Das ist schon sehr speziell. Da hast du hier geringe Aussichten auf Hilfe. Sondern eher in einer dedizierten MPI-community. Denn was da schiefgeht, irgendwo tief im C-Code, kann hier auch keiner sagen. Es wird da Tricks geben, das System zu besserem output zu ueberreden (man sieht zB fehlende DebugsSymbol-Files, damit wuerde man mehr Aussagekraft aus den Stacktraces bekommen). Aber woher man die bekommt - keine Ahnung.
Xbash_Zero
User
Beiträge: 30
Registriert: Montag 19. September 2022, 22:48

Danke, ich werde mal schauen, vielleicht finde ich noch weitere Informationen dazu.
Xbash_Zero
User
Beiträge: 30
Registriert: Montag 19. September 2022, 22:48

Kleiner Nachtrag zu meinen Vorhaben:

Ich habe das Cluster Computing mit einem anderen Modul 'Dask' versucht. Mit der Installation von Py4MPI scheint wohl etwas nicht ganz optimal zu sein, das muss ich mir noch mal ansehen.


Falls es jemanden dennoch interessieren sollte, hier der Code und wie man Dask auf Linux installiert: (Dies sollte auf allen Nodes und dem Master gemacht werden.)

Code: Alles auswählen

pip install dask[complete] distributed --upgrade


Angenommen, es sind 3 Rechner oder VM's vorhanden, so wird zunächst der Master gestartet:

Code: Alles auswählen

dask-scheduler


Anschließend werden die Worker gestartet, auf denen der Prozess ausgeführt bzw. gerechnet wird: (IP des Masters)

Code: Alles auswählen

dask-worker tcp://192.168.0.xx:8786

Jetzt kann der Code ausgeführt werden, entweder auf einem vierten PC oder mittels neuem SSH Promt auf dem Master:

Code: Alles auswählen

from multiprocessing import Pool
import numpy as np
from dask.distributed import Client
import dask.array as da
import time

# Dask Function
def dask_count_points_in_circle(n):
    x = da.random.uniform(-1, 1, size=n, chunks=n//1)  # Added chunksize for better performance
    y = da.random.uniform(-1, 1, size=n, chunks=n//1)  # Added chunksize for better performance
    return da.sum(x**2 + y**2 <= 1)

# Multiprocessing Function
def mp_count_points_in_circle(n):
    da.random.seed()  # Important: set a different seed for each process
    x = da.random.uniform(-1, 1, size=n).compute()  # convert Dask array to NumPy array for computation
    y = da.random.uniform(-1, 1, size=n).compute()  # convert Dask array to NumPy array for computation
    return np.sum(x**2 + y**2 <= 1)

if __name__ == "__main__":

    num_points = 10**9
    num_tasks = 1000

    points_per_task = num_points // num_tasks

    # Dask computation
    client = Client("tcp://192.168.0.53:8786")

    results = [dask_count_points_in_circle(points_per_task) for _ in range(num_tasks)]

    start_time = time.time()
    futures = client.compute(results)  # compute results
    total_inside = sum(client.gather(futures))  # Gather results and sum them
    end_time = time.time()

    pi_approximation = 4 * total_inside / num_points

    print("Dask Approximation of Pi: ", pi_approximation)
    print("Dask Time taken: ", end_time - start_time, "seconds")

    client.close()

    # Multiprocessing computation
    with Pool(processes=8) as pool:
        start_time = time.time()
        results = pool.imap_unordered(mp_count_points_in_circle, [points_per_task]*num_tasks)
        for result in results:
            total_inside += result
            pi_approximation = 4 * total_inside / num_points
            print("Current Approximation of Pi: ", pi_approximation)
            if pi_approximation > 3.14159:  # Beispielhafte Überprüfung auf gewünschtes Ergebnis
                break
        end_time = time.time()

    print("Multiprocessing Approximation of Pi: ", pi_approximation)
    print("Multiprocessing Time taken: ", end_time - start_time, "seconds")

Code: Alles auswählen

~/dask$ python3 dask_worker.py
/home/user/.local/lib/python3.10/site-packages/distributed/client.py:1388: VersionMismatchWarning: Mismatched versions found

+---------+----------------+----------------+--------------------------------------+
| Package | Client         | Scheduler      | Workers                              |
+---------+----------------+----------------+--------------------------------------+
| numpy   | 1.23.5         | 1.23.5         | {'1.24.3', '1.23.5'}                 |
| pandas  | 2.0.1          | 2.0.1          | {'2.0.1', None}                      |
| python  | 3.10.6.final.0 | 3.10.6.final.0 | {'3.10.6.final.0', '3.8.10.final.0'} |
+---------+----------------+----------------+--------------------------------------+
  warnings.warn(version_module.VersionMismatchWarning(msg[0]["warning"]))
/home/user/.local/lib/python3.10/site-packages/distributed/client.py:3108: UserWarning: Sending large graph of size 14.96 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(
Dask Approximation of Pi:  3.141591676
Dask Time taken:  172.01823711395264 seconds
Multiprocessing Approximation of Pi:  3.14473354
Multiprocessing Time taken:  0.21569275856018066 seconds


Nicht besonders performant, wenn man das mit multiprocessing direkt vergleicht. Es gibt da noch andere computing Optionen wie scatter beispielsweise:

Code: Alles auswählen

from dask.distributed import Client
import dask.array as da
import time

def dask_count_points_in_circle(n):
    x = da.random.uniform(-1, 1, size=n, chunks=n//0.51)
    y = da.random.uniform(-1, 1, size=n, chunks=n//0.51)
    return da.sum(x**2 + y**2 <= 1)

if __name__ == "__main__":

    num_points = 10**9
    num_tasks = 1000

    points_per_task = num_points // num_tasks

    client = Client("tcp://192.168.0.53:8786")

    data = [dask_count_points_in_circle(points_per_task) for _ in range(num_tasks)]
    scattered_data = client.scatter(data)

    start_time = time.time()
    futures = client.compute(scattered_data)
    results = client.gather(futures)  # Gather results first

    total_inside = sum(r.compute() for r in results)  # Then compute the sum
    end_time = time.time()

    pi_approximation = 4 * total_inside / num_points

    print("Dask Approximation of Pi: ", pi_approximation)
    print("Dask Time taken: ", end_time - start_time, "seconds")

    client.close()
Das liefert ein etwas besseres Ergebnis.

Code: Alles auswählen

Dask Approximation of Pi:  3.141608084
Dask Time taken:  62.888280391693115 seconds
Man kann hier noch am Algorithmus schrauben, die chunks size verkleinern oder vergrößern, das ist sind die zu berechnenden Stücke, welche an die Worker verteilt werden. Auch andere Optionen stehen da noch zur Verfügung, ich bin jetzt auch nicht dazu gekommen alles zu testen.

Was auch noch interessant ist, es gibt ein Dashboard, welches dann unter http://192.168.0.xx:8787/status erreicht werden kann, dieses wird vom Master bereitgestellt.


Nette Spielerei für zwischendurch, ob man das unbedingt braucht, kommt wohl sehr auf das spezifische Anwendungsgebiet an.

Greets
Benutzeravatar
__blackjack__
User
Beiträge: 13931
Registriert: Samstag 2. Juni 2018, 10:21
Wohnort: 127.0.0.1
Kontaktdaten:

@Xbash_Zero: Bei der Dask-Lösung ist einiges nicht in der Zeitmessung was in der Multiprocessing-Lösung enthalten ist. Dafür gibt es in der Multiprocessing-Lösung eine vorzeitige Abbruchbedingung die es bei der Dask-Lösung nicht gibt.

`time.time()` sollte man nicht für Zeitmessungen verwenden. Dafür sind `time.monotonic()` oder die Performance-Counter-Funktionen geeignet.

Bei `dask` ist der `Client` auch ein Kontextmanager, kann also wie `Pool` mit ``with`` verwendet werden.

Der Kommentar beim `seed()`-Aufruf ist mindestens potentiell falsch, denn der stimmt nur falls diese Funktion genau einmal pro Prozess aufgerufen wird, was man an *der* Stelle im Code gar nicht wissen kann.

Ungetestet:

Code: Alles auswählen

#!/usr/bin/env python3
import time
from multiprocessing import Pool

import dask.array as da
import numpy as np
from dask.distributed import Client


def dask_count_points_in_circle(n):
    #
    # Added chunksize for better performance.
    #
    x = da.random.uniform(-1, 1, size=n, chunks=n)
    y = da.random.uniform(-1, 1, size=n, chunks=n)
    return da.sum(x**2 + y**2 <= 1)


def multiprocessing_count_points_in_circle(n):
    da.random.seed()  # Important: set a different seed for each call.
    #
    # Convert Dask arrays to NumPy arrays for computation.
    #
    x = da.random.uniform(-1, 1, size=n).compute()
    y = da.random.uniform(-1, 1, size=n).compute()
    return np.sum(x**2 + y**2 <= 1)


def main():
    num_points = 10**9
    num_tasks = 1000
    points_per_task = num_points // num_tasks
    #
    # Dask computation.
    #
    with Client("tcp://192.168.0.53:8786") as client:
        start_time = time.monotonic()
        total_inside = sum(
            client.gather(
                client.compute(
                    [
                        dask_count_points_in_circle(points_per_task)
                        for _ in range(num_tasks)
                    ]
                )
            )
        )
        pi_approximation = 4 * total_inside / num_points
        end_time = time.monotonic()

    print("Dask Approximation of Pi: ", pi_approximation)
    print("Dask Time taken: ", end_time - start_time, "seconds")
    #
    # Multiprocessing computation.
    #
    with Pool(processes=8) as pool:
        start_time = time.monotonic()
        total_inside = sum(
            pool.imap_unordered(
                multiprocessing_count_points_in_circle,
                [points_per_task] * num_tasks,
            )
        )
        pi_approximation = 4 * total_inside / num_points
        end_time = time.monotonic()

    print("Multiprocessing Approximation of Pi: ", pi_approximation)
    print("Multiprocessing Time taken: ", end_time - start_time, "seconds")


if __name__ == "__main__":
    main()
“Java is a DSL to transform big Xml documents into long exception stack traces.”
— Scott Bellware
Xbash_Zero
User
Beiträge: 30
Registriert: Montag 19. September 2022, 22:48

__blackjack__ hat geschrieben: Samstag 20. Mai 2023, 15:08 @Xbash_Zero: Bei der Dask-Lösung ist einiges nicht in der Zeitmessung was in der Multiprocessing-Lösung enthalten ist. Dafür gibt es in der Multiprocessing-Lösung eine vorzeitige Abbruchbedingung die es bei der Dask-Lösung nicht gibt.

`time.time()` sollte man nicht für Zeitmessungen verwenden. Dafür sind `time.monotonic()` oder die Performance-Counter-Funktionen geeignet.

Bei `dask` ist der `Client` auch ein Kontextmanager, kann also wie `Pool` mit ``with`` verwendet werden.

Der Kommentar beim `seed()`-Aufruf ist mindestens potentiell falsch, denn der stimmt nur falls diese Funktion genau einmal pro Prozess aufgerufen wird, was man an *der* Stelle im Code gar nicht wissen kann.

[...]
Hi Blackjack,

Danke Dir für die Information und die Verbesserungen, ich werde es testen und mich wieder melden.

Gruß
Xbash_Zero
User
Beiträge: 30
Registriert: Montag 19. September 2022, 22:48

Der Test hat funktioniert, das mit dem Content-Manager und der Zeitmessung ist eine gute Idee, Danke vielmals!


Ein Worker:

Code: Alles auswählen

python3 dask_worker3.py
Dask Approximation of Pi:  3.14154684
Dask Time taken:  27.223749997996492 seconds
Multiprocessing Approximation of Pi:  3.141670976
Multiprocessing Time taken:  5.087493767001433 seconds


Zwei Worker:

Code: Alles auswählen

user@server-backend-gpu:~/dask$ python3 dask_worker3.py
/home/user/.local/lib/python3.10/site-packages/distributed/client.py:1388: VersionMismatchWarning: Mismatched versions found

+---------+--------+-----------+----------------------+
| Package | Client | Scheduler | Workers              |
+---------+--------+-----------+----------------------+
| numpy   | 1.23.5 | 1.23.5    | {'1.23.5', '1.24.3'} |
+---------+--------+-----------+----------------------+
  warnings.warn(version_module.VersionMismatchWarning(msg[0]["warning"]))
Dask Approximation of Pi:  3.14163512
Dask Time taken:  18.255224649008596 seconds
Multiprocessing Approximation of Pi:  3.141627516
Multiprocessing Time taken:  5.069045836004079 seconds
Xbash_Zero
User
Beiträge: 30
Registriert: Montag 19. September 2022, 22:48

Nachtrag:

Zurück zum ursprünglichen Post, habe ich den Fehler gefunden, es lag einfach daran, dass die Python Versionen unterschiedlich waren, nach einem Upgrade, hat es funktioniert:

Code: Alles auswählen

mpiexec -hostfile hosts.txt -n 24 --display-map python3 monte_carlo_mpi.py --mca plm_base_verbose 30

Code: Alles auswählen

~/py4mpi$ mpiexec -hostfile hosts.txt -n 24 --display-map python3 monte_carlo_mpi.py --mca plm_base_verbose 30
[server-backend-gpu:239971] mca: base: components_register: registering framework plm components
[server-backend-gpu:239971] mca: base: components_register: found loaded component rsh
[server-backend-gpu:239971] mca: base: components_register: component rsh register function successful
[server-backend-gpu:239971] mca: base: components_register: found loaded component isolated
[server-backend-gpu:239971] mca: base: components_register: component isolated has no register or open function
[server-backend-gpu:239971] mca: base: components_register: found loaded component slurm
[server-backend-gpu:239971] mca: base: components_register: component slurm register function successful
[server-backend-gpu:239971] mca: base: components_open: opening plm components
[server-backend-gpu:239971] mca: base: components_open: found loaded component rsh
[server-backend-gpu:239971] mca: base: components_open: component rsh open function successful
[server-backend-gpu:239971] mca: base: components_open: found loaded component isolated
[server-backend-gpu:239971] mca: base: components_open: component isolated open function successful
[server-backend-gpu:239971] mca: base: components_open: found loaded component slurm
[server-backend-gpu:239971] mca: base: components_open: component slurm open function successful
[server-backend-gpu:239971] mca:base:select: Auto-selecting plm components
[server-backend-gpu:239971] mca:base:select:(  plm) Querying component [rsh]
[server-backend-gpu:239971] mca:base:select:(  plm) Query of component [rsh] set priority to 10
[server-backend-gpu:239971] mca:base:select:(  plm) Querying component [isolated]
[server-backend-gpu:239971] mca:base:select:(  plm) Query of component [isolated] set priority to 0
[server-backend-gpu:239971] mca:base:select:(  plm) Querying component [slurm]
[server-backend-gpu:239971] mca:base:select:(  plm) Selected component [rsh]
[server-backend-gpu:239971] mca: base: close: component isolated closed
[server-backend-gpu:239971] mca: base: close: unloading component isolated
[server-backend-gpu:239971] mca: base: close: component slurm closed
[server-backend-gpu:239971] mca: base: close: unloading component slurm
[server-backend-gpu:239971] [[26850,0],0] plm:rsh: final template argv:
        /usr/bin/ssh <template>  orted -mca ess "env" -mca ess_base_jobid "1759641600" -mca ess_base_vpid "<template>" -mca ess_base_num_procs "4" -mca orte_node_regex "server-backend-gpu,[3:192].168.0.52,[3:192].168.0.55,[3:192].168.0.24@0(4)" -mca orte_hnp_uri "1759641600.0;tcp://192.168.0.53:34811" -mca plm "rsh" --tree-spawn -mca routed "radix" -mca orte_parent_uri "1759641600.0;tcp://192.168.0.53:34811" -mca plm_base_verbose "30" -mca rmaps_base_display_map "1" -mca pmix "^s1,s2,cray,isolated"
[worker:01336] mca: base: components_register: registering framework plm components
[worker:01336] mca: base: components_register: found loaded component rsh
[worker:01336] mca: base: components_register: component rsh register function successful
[worker:01336] mca: base: components_open: opening plm components
[worker:01336] mca: base: components_open: found loaded component rsh
[worker:01336] mca: base: components_open: component rsh open function successful
[worker:01336] mca:base:select: Auto-selecting plm components
[worker:01336] mca:base:select:(  plm) Querying component [rsh]
[worker:01336] mca:base:select:(  plm) Query of component [rsh] set priority to 10
[worker:01336] mca:base:select:(  plm) Selected component [rsh]
[server-backend:23312] mca: base: components_register: registering framework plm components
[server-backend:23312] mca: base: components_register: found loaded component rsh
[server-backend:23312] mca: base: components_register: component rsh register function successful
[server-backend:23312] mca: base: components_open: opening plm components
[server-backend:23312] mca: base: components_open: found loaded component rsh
[server-backend:23312] mca: base: components_open: component rsh open function successful
[server-backend:23312] mca:base:select: Auto-selecting plm components
[server-backend:23312] mca:base:select:(  plm) Querying component [rsh]
[server-backend:23312] mca:base:select:(  plm) Query of component [rsh] set priority to 10
[server-backend:23312] mca:base:select:(  plm) Selected component [rsh]
[feldbus:26268] mca: base: components_register: registering framework plm components
[feldbus:26268] mca: base: components_register: found loaded component rsh
[feldbus:26268] mca: base: components_register: component rsh register function successful
[feldbus:26268] mca: base: components_open: opening plm components
[feldbus:26268] mca: base: components_open: found loaded component rsh
[feldbus:26268] mca: base: components_open: component rsh open function successful
[feldbus:26268] mca:base:select: Auto-selecting plm components
[feldbus:26268] mca:base:select:(  plm) Querying component [rsh]
[feldbus:26268] mca:base:select:(  plm) Query of component [rsh] set priority to 10
[feldbus:26268] mca:base:select:(  plm) Selected component [rsh]
[server-backend-gpu:239971] [[26850,0],0] complete_setup on job [26850,1]
 Data for JOB [26850,1] offset 0 Total slots allocated 32

 ========================   JOB MAP   ========================

 Data for node: 192.168.0.52    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [26850,1] App: 0 Process rank: 0 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 1 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 2 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 3 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 4 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 5 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 6 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 7 Bound: N/A

 Data for node: 192.168.0.55    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [26850,1] App: 0 Process rank: 8 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 9 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 10 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 11 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 12 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 13 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 14 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 15 Bound: N/A

 Data for node: 192.168.0.24    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [26850,1] App: 0 Process rank: 16 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 17 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 18 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 19 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 20 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 21 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 22 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 23 Bound: N/A

 =============================================================
 Data for JOB [26850,1] offset 0 Total slots allocated 32

 ========================   JOB MAP   ========================

 Data for node: 192.168.0.52    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [26850,1] App: 0 Process rank: 0 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 1 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 2 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 3 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 4 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 5 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 6 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 7 Bound: N/A

 Data for node: 192.168.0.55    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [26850,1] App: 0 Process rank: 8 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 9 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 10 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 11 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 12 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 13 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 14 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 15 Bound: UNBOUND

 Data for node: 192.168.0.24    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [26850,1] App: 0 Process rank: 16 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 17 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 18 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 19 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 20 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 21 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 22 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 23 Bound: N/A

 =============================================================
 Data for JOB [26850,1] offset 0 Total slots allocated 32

 ========================   JOB MAP   ========================

 Data for node: 192.168.0.52    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [26850,1] App: 0 Process rank: 0 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 1 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 2 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 3 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 4 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 5 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 6 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 7 Bound: UNBOUND

 Data for node: 192.168.0.55    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [26850,1] App: 0 Process rank: 8 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 9 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 10 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 11 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 12 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 13 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 14 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 15 Bound: N/A

 Data for node: 192.168.0.24    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [26850,1] App: 0 Process rank: 16 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 17 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 18 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 19 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 20 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 21 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 22 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 23 Bound: N/A

 =============================================================
 Data for JOB [26850,1] offset 0 Total slots allocated 32

 ========================   JOB MAP   ========================

 Data for node: 192.168.0.52    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [26850,1] App: 0 Process rank: 0 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 1 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 2 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 3 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 4 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 5 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 6 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 7 Bound: N/A

 Data for node: 192.168.0.55    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [26850,1] App: 0 Process rank: 8 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 9 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 10 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 11 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 12 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 13 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 14 Bound: N/A
        Process OMPI jobid: [26850,1] App: 0 Process rank: 15 Bound: N/A

 Data for node: 192.168.0.24    Num slots: 8    Max slots: 0    Num procs: 8
        Process OMPI jobid: [26850,1] App: 0 Process rank: 16 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 17 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 18 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 19 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 20 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 21 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 22 Bound: UNBOUND
        Process OMPI jobid: [26850,1] App: 0 Process rank: 23 Bound: UNBOUND

 =============================================================
[server-backend-gpu:239971] [[26850,0],0] plm:base:receive update proc state command from [[26850,0],2]
[server-backend-gpu:239971] [[26850,0],0] plm:base:receive got update_proc_state for job [26850,1]
[server-backend-gpu:239971] [[26850,0],0] plm:base:receive update proc state command from [[26850,0],1]
[server-backend-gpu:239971] [[26850,0],0] plm:base:receive got update_proc_state for job [26850,1]
[server-backend-gpu:239971] [[26850,0],0] plm:base:receive update proc state command from [[26850,0],3]
[server-backend-gpu:239971] [[26850,0],0] plm:base:receive got update_proc_state for job [26850,1]
Time taken on process 12: 3.4206314086914062 seconds
Time taken on process 8: 3.7719266414642334 seconds
Time taken on process 10: 3.7838056087493896 seconds
Time taken on process 14: 3.853585720062256 seconds
Time taken on process 11: 3.9059817790985107 seconds
Time taken on process 9: 3.941044330596924 seconds
Time taken on process 13: 3.9901468753814697 seconds
Time taken on process 15: 4.068371772766113 seconds
Time taken on process 17: 7.137271404266357 seconds
Time taken on process 6: 7.300758123397827 seconds
Time taken on process 1: 7.335713624954224 seconds
Time taken on process 20: 7.370870113372803 seconds
Time taken on process 23: 7.389361619949341 seconds
Time taken on process 21: 7.422905921936035 seconds
Time taken on process 2: 7.422617673873901 seconds
Time taken on process 5: 7.50658917427063 seconds
Time taken on process 7: 7.509302139282227 seconds
Time taken on process 18: 7.511310577392578 seconds
Time taken on process 19: 7.592760324478149 seconds
Time taken on process 16: 7.627291440963745 seconds
Time taken on process 0: 7.7595086097717285 seconds
Time taken on process 3: 7.83817720413208 seconds
Time taken on process 4: 7.867914438247681 seconds
Time taken on process 22: 8.346022129058838 seconds
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
Pi estimate: 3.1412499025999843
[server-backend-gpu:239971] [[26850,0],0] plm:base:receive update proc state command from [[26850,0],2]
[server-backend-gpu:239971] [[26850,0],0] plm:base:receive got update_proc_state for job [26850,1]
[server-backend-gpu:239971] [[26850,0],0] plm:base:receive update proc state command from [[26850,0],3]
[server-backend-gpu:239971] [[26850,0],0] plm:base:receive got update_proc_state for job [26850,1]
[server-backend-gpu:239971] [[26850,0],0] plm:base:receive update proc state command from [[26850,0],1]
[server-backend-gpu:239971] [[26850,0],0] plm:base:receive got update_proc_state for job [26850,1]
[worker:01336] mca: base: close: component rsh closed
[worker:01336] mca: base: close: unloading component rsh
[server-backend:23312] mca: base: close: component rsh closed
[server-backend:23312] mca: base: close: unloading component rsh
[feldbus:26268] mca: base: close: component rsh closed
[feldbus:26268] mca: base: close: unloading component rsh
[server-backend-gpu:239971] mca: base: close: component rsh closed
[server-backend-gpu:239971] mca: base: close: unloading component rsh
Das Pi nur bis zur 3 Nachkommastelle korrekt zu sein scheint, schätze ich, liegt an der Implementierung.


Die Installation ist eigentlich ziemlich easy:

Code: Alles auswählen

sudo apt-get install openmpi-bin openmpi-common
sudo apt-get install libopenmpi-dev
pip install mpi4py

Anschließend noch einen SSH Key generieren und mit den Workern austauschen:

Code: Alles auswählen

ssh-keygen
ssh-copy-id user@192.168.0.52
ssh user@192.168.0.52
Die Worker sollten dabei noch in die hosts.txt eingetragen werden, die Slots repräsentieren die anzahl der CPU'S:

Code: Alles auswählen

192.168.0.52 slots=8
192.168.0.55 slots=8
192.168.0.24 slots=8

Der Monte Carlo Algorithmus zur Berechnung von Pi: (So wie ich es verstanden habe, muss diese Datei im gleichen Verzeichniss auf allen Workern vorhanden sein, so war es auch in meinen Tests und MPI scheint bis jetzt am performantesten zu sein (Monte Carlo Simulation)... weitere Tests werde ich wohl noch durchführen...)

Code: Alles auswählen

import random
import numpy as np
from mpi4py import MPI
import time

def monte_carlo_pi(n_samples):
    count = 0
    for _ in range(n_samples):
        x, y = random.random(), random.random()
        dist = np.sqrt(x**2 + y**2)
        count += dist <= 1
    return 4 * count / n_samples

if __name__ == "__main__":
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()

    total_samples = 100_000_000  # Increase this for more CPU usage
    samples_per_process = total_samples // size

    comm.Barrier()

    start_time = time.monotonic()
    local_pi_estimate = monte_carlo_pi(samples_per_process)
    end_time = time.monotonic()

    elapsed_time = end_time - start_time
    print(f"Time taken on process {rank}: {elapsed_time} seconds")

    global_pi_estimate = comm.allreduce(local_pi_estimate, op=MPI.SUM) / size
    print(f"Pi estimate: {global_pi_estimate}")
Antworten