SISSA DIGITAL LIBRARYInstitutional Research Information System (Statistiche: prodotti, OA)
Per informazioni contatta [email protected]

Pre-stack depth migration (PSDM) is a computationally intensive algorithm widely used in seismic imaging to accurately position subsurface reflectors. Its high computational cost, largely dominated by deeply nested loops and irregular memory access patterns, makes performance optimization essential for large-scale seismic processing. This thesis investigates performance improvements of a PSDM kernel through enhanced vectorization and pure OpenMP-based parallelization. Initially, the code was analyzed to identify opportunities for improving compiler auto-vectorization within the most computationally demanding loops. Loop restructuring, reduction of data dependencies, and improved memory access patterns were applied to increase vectorization efficiency and better utilize modern CPU vector units. However, due to varying loop bounds across iterations, relying solely on SIMD optimization limits both workload balance and overall scalability. In such cases, combining outer-loop parallelism with inner-loop compiler auto-vectorization typically provides better control over workload distribution and vectorization efficiency. Building on this observation, the optimized OpenMP implementation parallelizes higher-level loops while preserving the previously improved vectorized kernels. Special attention was given to workload distribution, scheduling strategies, and memory access patterns in order to minimize thread contention and maximize processor utilization. To evaluate the effectiveness of the proposed approach, both strong scaling and weak scaling studies were conducted to analyze performance behavior as the number of processing cores and problem size vary. In addition to performance analysis, energy consumption considerations were incorporated by monitoring the energy usage of the OpenMP-parallelized implementation, allowing an assessment of the trade-offs between performance gains and energy efficiency on multi-core systems. Experimental results demonstrate that the optimized implementation achieves significant performance improvements over the baseline version while maintaining favorable energy characteristics. Overall, the findings highlight that combining improved vectorization with scalable thread-level parallelism and careful performance analysis can substantially accelerate PSDM workloads on modern high-performance computing architectures.

Parallelization of the pre-stack depth migration code / Enayati, Mohammad. - (2026 Mar 27).

Parallelization of the pre-stack depth migration code

Enayati, Mohammad

2026-03-27

Abstract

Pre-stack depth migration (PSDM) is a computationally intensive algorithm widely used in seismic imaging to accurately position subsurface reflectors. Its high computational cost, largely dominated by deeply nested loops and irregular memory access patterns, makes performance optimization essential for large-scale seismic processing. This thesis investigates performance improvements of a PSDM kernel through enhanced vectorization and pure OpenMP-based parallelization. Initially, the code was analyzed to identify opportunities for improving compiler auto-vectorization within the most computationally demanding loops. Loop restructuring, reduction of data dependencies, and improved memory access patterns were applied to increase vectorization efficiency and better utilize modern CPU vector units. However, due to varying loop bounds across iterations, relying solely on SIMD optimization limits both workload balance and overall scalability. In such cases, combining outer-loop parallelism with inner-loop compiler auto-vectorization typically provides better control over workload distribution and vectorization efficiency. Building on this observation, the optimized OpenMP implementation parallelizes higher-level loops while preserving the previously improved vectorized kernels. Special attention was given to workload distribution, scheduling strategies, and memory access patterns in order to minimize thread contention and maximize processor utilization. To evaluate the effectiveness of the proposed approach, both strong scaling and weak scaling studies were conducted to analyze performance behavior as the number of processing cores and problem size vary. In addition to performance analysis, energy consumption considerations were incorporated by monitoring the energy usage of the OpenMP-parallelized implementation, allowing an assessment of the trade-offs between performance gains and energy efficiency on multi-core systems. Experimental results demonstrate that the optimized implementation achieves significant performance improvements over the baseline version while maintaining favorable energy characteristics. Overall, the findings highlight that combining improved vectorization with scalable thread-level parallelism and careful performance analysis can substantially accelerate PSDM workloads on modern high-performance computing architectures.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				27-mar-2026
			
	Aree SISSA
	
				Non assegn
			
	Relatore/i esterni
	
				Tinivella, Umberta; Davydenkova, Irina
			
	Appare nelle tipologie:
	
				8.4 Master thesis in High Performance Computing (HPC)

File in questo prodotto:

File	Dimensione	Formato
thesis_Enayati.pdf accesso aperto Tipologia: Tesi Licenza: Non specificato Dimensione 6.33 MB Formato Adobe PDF Visualizza/Apri	6.33 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/151811

Citazioni

ND

ND

ND

social impact