Ab initio simulations, such as those performed with Quantum ESPRESSO (QE), play a central role in materials science but are often limited by their high computational cost. Predicting the execution time of self-consistent field (SCF) iterations is particularly challenging, as performance depends on both the physical characteristics of the simulated system and the parallelization parameters of the underlying hardware. This thesis investigates the use of machine learning (ML) techniques to predict the time required per SCF iteration directly from QE inputs, pseudopotentials, and computational settings. A complete workflow was designed to process raw benchmarking data into structured datasets, evaluate multiple regression approaches, and integrate the trained models into a web-based prediction-serving system. Among the tested models, Random Forests achieved the highest overall predictive accuracy and interpretability, revealing that the number of Kohn–Sham states, total cores, and electrons are the most influential factors affecting runtime. Fully Connected Neural Networks showed comparable performance and offered smooth, consistent predictions across a wide range of execution times. Simpler models, such as Kernel Ridge Regression and Linear Regression, provided useful baselines for comparison but were less effective in capturing nonlinear dependencies. Beyond model evaluation, a practical web interface was developed to make runtime prediction accessible to users in real time. By uploading QE input files and specifying hardware configurations, users can obtain immediate predictions of computational cost, supporting more informed resource allocation and efficient planning of large-scale simulations. Overall, this work demonstrates how data-driven approaches can complement traditional performance modeling in high-performance computing. By combining interpretability, predictive accuracy, and real-world deployment, the developed system represents a step toward intelligent, ML-assisted simulation workflows.
Design and Implementation of a Prediction-Serving System for Runtime and Parallel Performance in Quantum ESPRESSO / Safari, Mandana. - (2025 Dec 16).
Design and Implementation of a Prediction-Serving System for Runtime and Parallel Performance in Quantum ESPRESSO
SAFARI, MANDANA
2025-12-16
Abstract
Ab initio simulations, such as those performed with Quantum ESPRESSO (QE), play a central role in materials science but are often limited by their high computational cost. Predicting the execution time of self-consistent field (SCF) iterations is particularly challenging, as performance depends on both the physical characteristics of the simulated system and the parallelization parameters of the underlying hardware. This thesis investigates the use of machine learning (ML) techniques to predict the time required per SCF iteration directly from QE inputs, pseudopotentials, and computational settings. A complete workflow was designed to process raw benchmarking data into structured datasets, evaluate multiple regression approaches, and integrate the trained models into a web-based prediction-serving system. Among the tested models, Random Forests achieved the highest overall predictive accuracy and interpretability, revealing that the number of Kohn–Sham states, total cores, and electrons are the most influential factors affecting runtime. Fully Connected Neural Networks showed comparable performance and offered smooth, consistent predictions across a wide range of execution times. Simpler models, such as Kernel Ridge Regression and Linear Regression, provided useful baselines for comparison but were less effective in capturing nonlinear dependencies. Beyond model evaluation, a practical web interface was developed to make runtime prediction accessible to users in real time. By uploading QE input files and specifying hardware configurations, users can obtain immediate predictions of computational cost, supporting more informed resource allocation and efficient planning of large-scale simulations. Overall, this work demonstrates how data-driven approaches can complement traditional performance modeling in high-performance computing. By combining interpretability, predictive accuracy, and real-world deployment, the developed system represents a step toward intelligent, ML-assisted simulation workflows.| File | Dimensione | Formato | |
|---|---|---|---|
|
thesis_Safari.pdf
accesso aperto
Descrizione: Master in HPC
Tipologia:
Tesi
Licenza:
Non specificato
Dimensione
3.7 MB
Formato
Adobe PDF
|
3.7 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


