I describe here the performance of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture using the MPI message-passing library. For a configuration with a constant number of particles per processor the scalability of the code was tested up to P=128 processors on an IBM SP4 machine. In the large P limit the average CPU time per processor necessary for solving the gravitational interactions is ∼10 higher than that expected from the ideal scaling relation. The processor domains are determined every large timestep according to a recursive orthogonal bisection, using a weighting scheme which takes into account the total particle computational load within the timestep. The results of the numerical tests show that the load balancing efficiency L of the code is high (>=90) up to P=32, and decreases to L∼80 when P=128. In the latter case it is found that some aspects of the code performance are affected by machine hardware, while the proposed weighting scheme can achieve a load balance as high as L∼90 even in the large P limit.

Parallelization of a treecode / Valdarnini, Riccardo. - In: NEW ASTRONOMY. - ISSN 1384-1076. - 8:7(2003), pp. 691-710. [10.1016/S1384-1076(03)00057-5]

Parallelization of a treecode

Valdarnini, Riccardo
2003-01-01

Abstract

I describe here the performance of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture using the MPI message-passing library. For a configuration with a constant number of particles per processor the scalability of the code was tested up to P=128 processors on an IBM SP4 machine. In the large P limit the average CPU time per processor necessary for solving the gravitational interactions is ∼10 higher than that expected from the ideal scaling relation. The processor domains are determined every large timestep according to a recursive orthogonal bisection, using a weighting scheme which takes into account the total particle computational load within the timestep. The results of the numerical tests show that the load balancing efficiency L of the code is high (>=90) up to P=32, and decreases to L∼80 when P=128. In the latter case it is found that some aspects of the code performance are affected by machine hardware, while the proposed weighting scheme can achieve a load balance as high as L∼90 even in the large P limit.
2003
8
7
691
710
https://doi.org/10.1016/S1384-1076(03)00057-5
https://arxiv.org/abs/astro-ph/0303413
Valdarnini, Riccardo
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/16668
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact