Ab initio simulations, such as those performed with Quantum ESPRESSO (QE), play a central role in materials science but are often limited by their high computational cost. Predicting the execution time of self-consistent field (SCF) iterations is particularly challenging, as performance depends on both the physical characteristics of the simulated system and the parallelization parameters of the underlying hardware. This thesis investigates the use of machine learning (ML) techniques to predict the time required per SCF iteration directly from QE inputs, pseudopotentials, and computational settings. A complete workflow was designed to process raw benchmarking data into structured datasets, evaluate multiple regression approaches, and integrate the trained models into a web-based prediction-serving system. Among the tested models, Random Forests achieved the highest overall predictive accuracy and interpretability, revealing that the number of Kohn–Sham states, total cores, and electrons are the most influential factors affecting runtime. Fully Connected Neural Networks showed comparable performance and offered smooth, consistent predictions across a wide range of execution times. Simpler models, such as Kernel Ridge Regression and Linear Regression, provided useful baselines for comparison but were less effective in capturing nonlinear dependencies. Beyond model evaluation, a practical web interface was developed to make runtime prediction accessible to users in real time. By uploading QE input files and specifying hardware configurations, users can obtain immediate predictions of computational cost, supporting more informed resource allocation and efficient planning of large-scale simulations. Overall, this work demonstrates how data-driven approaches can complement traditional performance modeling in high-performance computing. By combining interpretability, predictive accuracy, and real-world deployment, the developed system represents a step toward intelligent, ML-assisted simulation workflows.

Design and Implementation of a Prediction-Serving System for Runtime and Parallel Performance in Quantum ESPRESSO / Safari, Mandana. - (2025 Dec 16).

Design and Implementation of a Prediction-Serving System for Runtime and Parallel Performance in Quantum ESPRESSO

SAFARI, MANDANA
2025-12-16

Abstract

Ab initio simulations, such as those performed with Quantum ESPRESSO (QE), play a central role in materials science but are often limited by their high computational cost. Predicting the execution time of self-consistent field (SCF) iterations is particularly challenging, as performance depends on both the physical characteristics of the simulated system and the parallelization parameters of the underlying hardware. This thesis investigates the use of machine learning (ML) techniques to predict the time required per SCF iteration directly from QE inputs, pseudopotentials, and computational settings. A complete workflow was designed to process raw benchmarking data into structured datasets, evaluate multiple regression approaches, and integrate the trained models into a web-based prediction-serving system. Among the tested models, Random Forests achieved the highest overall predictive accuracy and interpretability, revealing that the number of Kohn–Sham states, total cores, and electrons are the most influential factors affecting runtime. Fully Connected Neural Networks showed comparable performance and offered smooth, consistent predictions across a wide range of execution times. Simpler models, such as Kernel Ridge Regression and Linear Regression, provided useful baselines for comparison but were less effective in capturing nonlinear dependencies. Beyond model evaluation, a practical web interface was developed to make runtime prediction accessible to users in real time. By uploading QE input files and specifying hardware configurations, users can obtain immediate predictions of computational cost, supporting more informed resource allocation and efficient planning of large-scale simulations. Overall, this work demonstrates how data-driven approaches can complement traditional performance modeling in high-performance computing. By combining interpretability, predictive accuracy, and real-world deployment, the developed system represents a step toward intelligent, ML-assisted simulation workflows.
16-dic-2025
Non assegn
Affinito, Fabio
Baroni, Stefano
de Gironcoli, Stefano Maria
Delugas, Pietro Davide
Bonfà, Pietro
File in questo prodotto:
File Dimensione Formato  
thesis_Safari.pdf

accesso aperto

Descrizione: Master in HPC
Tipologia: Tesi
Licenza: Non specificato
Dimensione 3.7 MB
Formato Adobe PDF
3.7 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/149950
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact