SISSA DIGITAL LIBRARYInstitutional Research Information System (Statistiche: prodotti, OA)
Per informazioni contatta [email protected]

I describe here the performance of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture using the MPI message-passing library. For a configuration with a constant number of particles per processor the scalability of the code was tested up to P=128 processors on an IBM SP4 machine. In the large P limit the average CPU time per processor necessary for solving the gravitational interactions is ∼10 higher than that expected from the ideal scaling relation. The processor domains are determined every large timestep according to a recursive orthogonal bisection, using a weighting scheme which takes into account the total particle computational load within the timestep. The results of the numerical tests show that the load balancing efficiency L of the code is high (>=90) up to P=32, and decreases to L∼80 when P=128. In the latter case it is found that some aspects of the code performance are affected by machine hardware, while the proposed weighting scheme can achieve a load balance as high as L∼90 even in the large P limit.

Parallelization of a treecode / Valdarnini, R.. - In: NEW ASTRONOMY. - ISSN 1384-1076. - 8:7(2003), pp. 691-710. [10.1016/S1384-1076(03)00057-5]

Parallelization of a treecode

Valdarnini, Riccardo

2003-01-01

Abstract

I describe here the performance of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture using the MPI message-passing library. For a configuration with a constant number of particles per processor the scalability of the code was tested up to P=128 processors on an IBM SP4 machine. In the large P limit the average CPU time per processor necessary for solving the gravitational interactions is ∼10 higher than that expected from the ideal scaling relation. The processor domains are determined every large timestep according to a recursive orthogonal bisection, using a weighting scheme which takes into account the total particle computational load within the timestep. The results of the numerical tests show that the load balancing efficiency L of the code is high (>=90) up to P=32, and decreases to L∼80 when P=128. In the latter case it is found that some aspects of the code performance are affected by machine hardware, while the proposed weighting scheme can achieve a load balance as high as L∼90 even in the large P limit.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2003
			
	Rivista
	
				NEW ASTRONOMY
			
	Numero del volume
	
				8
			
	Fascicolo
	
				7
			
	Da pagina
	
				691
			
	A pagina
	
				710
			
	Codice DOI
	
				https://dx.doi.org/10.1016/S1384-1076(03)00057-5
			
	URL
	
				https://doi.org/10.1016/S1384-1076(03)00057-5
https://arxiv.org/abs/astro-ph/0303413
			
	Tutti gli autori
	
						Valdarnini, Riccardo
					
	Appare nelle tipologie:
	
				1.1 Journal article

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/16668

Citazioni

ND

1

1

social impact