Pre-stack depth migration (PSDM) is a computationally intensive algorithm widely used in seismic imaging to accurately position subsurface reflectors. Its high computational cost, largely dominated by deeply nested loops and irregular memory access patterns, makes performance optimization essential for large-scale seismic processing. This thesis investigates performance improvements of a PSDM kernel through enhanced vectorization and pure OpenMP-based parallelization. Initially, the code was analyzed to identify opportunities for improving compiler auto-vectorization within the most computationally demanding loops. Loop restructuring, reduction of data dependencies, and improved memory access patterns were applied to increase vectorization efficiency and better utilize modern CPU vector units. However, due to varying loop bounds across iterations, relying solely on SIMD optimization limits both workload balance and overall scalability. In such cases, combining outer-loop parallelism with inner-loop compiler auto-vectorization typically provides better control over workload distribution and vectorization efficiency. Building on this observation, the optimized OpenMP implementation parallelizes higher-level loops while preserving the previously improved vectorized kernels. Special attention was given to workload distribution, scheduling strategies, and memory access patterns in order to minimize thread contention and maximize processor utilization. To evaluate the effectiveness of the proposed approach, both strong scaling and weak scaling studies were conducted to analyze performance behavior as the number of processing cores and problem size vary. In addition to performance analysis, energy consumption considerations were incorporated by monitoring the energy usage of the OpenMP-parallelized implementation, allowing an assessment of the trade-offs between performance gains and energy efficiency on multi-core systems. Experimental results demonstrate that the optimized implementation achieves significant performance improvements over the baseline version while maintaining favorable energy characteristics. Overall, the findings highlight that combining improved vectorization with scalable thread-level parallelism and careful performance analysis can substantially accelerate PSDM workloads on modern high-performance computing architectures.
Parallelization of the pre-stack depth migration code / Enayati, Mohammad. - (2026 Mar 27).
Parallelization of the pre-stack depth migration code
Enayati, Mohammad
2026-03-27
Abstract
Pre-stack depth migration (PSDM) is a computationally intensive algorithm widely used in seismic imaging to accurately position subsurface reflectors. Its high computational cost, largely dominated by deeply nested loops and irregular memory access patterns, makes performance optimization essential for large-scale seismic processing. This thesis investigates performance improvements of a PSDM kernel through enhanced vectorization and pure OpenMP-based parallelization. Initially, the code was analyzed to identify opportunities for improving compiler auto-vectorization within the most computationally demanding loops. Loop restructuring, reduction of data dependencies, and improved memory access patterns were applied to increase vectorization efficiency and better utilize modern CPU vector units. However, due to varying loop bounds across iterations, relying solely on SIMD optimization limits both workload balance and overall scalability. In such cases, combining outer-loop parallelism with inner-loop compiler auto-vectorization typically provides better control over workload distribution and vectorization efficiency. Building on this observation, the optimized OpenMP implementation parallelizes higher-level loops while preserving the previously improved vectorized kernels. Special attention was given to workload distribution, scheduling strategies, and memory access patterns in order to minimize thread contention and maximize processor utilization. To evaluate the effectiveness of the proposed approach, both strong scaling and weak scaling studies were conducted to analyze performance behavior as the number of processing cores and problem size vary. In addition to performance analysis, energy consumption considerations were incorporated by monitoring the energy usage of the OpenMP-parallelized implementation, allowing an assessment of the trade-offs between performance gains and energy efficiency on multi-core systems. Experimental results demonstrate that the optimized implementation achieves significant performance improvements over the baseline version while maintaining favorable energy characteristics. Overall, the findings highlight that combining improved vectorization with scalable thread-level parallelism and careful performance analysis can substantially accelerate PSDM workloads on modern high-performance computing architectures.| File | Dimensione | Formato | |
|---|---|---|---|
|
thesis_Enayati.pdf
accesso aperto
Tipologia:
Tesi
Licenza:
Non specificato
Dimensione
6.33 MB
Formato
Adobe PDF
|
6.33 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


