With the increase in computational demands in high-energy physics, astrophysics, and gravitational wave studies, finding innovative computational solutions that balance performance and energy efficiency has become a critical issue. In this thesis, we investigate the applicability of RISC-V architectures for direct N-body simulations, a critical computational challenge in astrophysics. Wefirst benchmark the performance of a direct N-Body code on a dual-socket RISC-V Sophon SG2042 processor. The force evaluation kernel, implemented in mixed precision, is parallelized with MPI and OpenMP and is optimized with vector intrinsics (RVV). Comparative experiments are conducted on an AArch64 platform (NVIDIA Grace), with optimizations exploiting NEON vectorization, and on an x86 architecture (AMD EPYC 9554) exploiting AVX-512 vectoriza tion optimizations, enabling a comprehensive cross-platform analysis. At their re spective optimal configurations, RISC-V operates at approximately 4–5% of x86 throughput and ∼10% of AArch64 throughput, resulting in a 15.67× slowdown relative to x86 and 8.86× relative to AArch64. Despite the SG2042’s lower ther mal design power (120W), the longer execution time means it consumes 4.79× more total energy than x86 and 2.87× more than AArch64, yielding an Energy Delay Product 25.4× worse than AArch64 and 75.0× worse than x86. We then port the N-Body force kernel to the RISC-V-based Tenstorrent Wormhole n300 accelerator using the TT-Metalium programming interface, to the best of our knowledge the first astrophysical application to leverage this class of hardware. We evaluate single-device performance and energy efficiency against a highly optimized AMD EPYC 9124 CPU baseline parallelized with OpenMP and optimized with AVX-512 intrinsics, demonstrating a 2.23× speedup and 1.80× energy savings. We further investigate three strategies for scaling the code across multiple Wormhole cards and chips using MPI, analyzing their scalability and energy-delay characteristics. The results demonstrate significant differences in computational and energy efficiency across platforms, emphasizing the role of architectural and software level optimizations in achieving energy-efficient scientific computing. Our findings provide valuable insights into the potential of RISC-V systems for scalable, low energy HPC applications in astrophysics and beyond.

Performance and Energy Efficiency Analysis of RISC-V Architecture for Direct N-Body Simulations(2026 Mar 27).

Performance and Energy Efficiency Analysis of RISC-V Architecture for Direct N-Body Simulations

-
2026-03-27

Abstract

With the increase in computational demands in high-energy physics, astrophysics, and gravitational wave studies, finding innovative computational solutions that balance performance and energy efficiency has become a critical issue. In this thesis, we investigate the applicability of RISC-V architectures for direct N-body simulations, a critical computational challenge in astrophysics. Wefirst benchmark the performance of a direct N-Body code on a dual-socket RISC-V Sophon SG2042 processor. The force evaluation kernel, implemented in mixed precision, is parallelized with MPI and OpenMP and is optimized with vector intrinsics (RVV). Comparative experiments are conducted on an AArch64 platform (NVIDIA Grace), with optimizations exploiting NEON vectorization, and on an x86 architecture (AMD EPYC 9554) exploiting AVX-512 vectoriza tion optimizations, enabling a comprehensive cross-platform analysis. At their re spective optimal configurations, RISC-V operates at approximately 4–5% of x86 throughput and ∼10% of AArch64 throughput, resulting in a 15.67× slowdown relative to x86 and 8.86× relative to AArch64. Despite the SG2042’s lower ther mal design power (120W), the longer execution time means it consumes 4.79× more total energy than x86 and 2.87× more than AArch64, yielding an Energy Delay Product 25.4× worse than AArch64 and 75.0× worse than x86. We then port the N-Body force kernel to the RISC-V-based Tenstorrent Wormhole n300 accelerator using the TT-Metalium programming interface, to the best of our knowledge the first astrophysical application to leverage this class of hardware. We evaluate single-device performance and energy efficiency against a highly optimized AMD EPYC 9124 CPU baseline parallelized with OpenMP and optimized with AVX-512 intrinsics, demonstrating a 2.23× speedup and 1.80× energy savings. We further investigate three strategies for scaling the code across multiple Wormhole cards and chips using MPI, analyzing their scalability and energy-delay characteristics. The results demonstrate significant differences in computational and energy efficiency across platforms, emphasizing the role of architectural and software level optimizations in achieving energy-efficient scientific computing. Our findings provide valuable insights into the potential of RISC-V systems for scalable, low energy HPC applications in astrophysics and beyond.
27-mar-2026
Spera, Mario
File in questo prodotto:
File Dimensione Formato  
thesis_Almerol.pdf

accesso aperto

Tipologia: Tesi
Licenza: Non specificato
Dimensione 2.43 MB
Formato Adobe PDF
2.43 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/151790
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact