SISSA DIGITAL LIBRARYInstitutional Research Information System (Statistiche: prodotti, OA)
Per informazioni contatta [email protected]

Ab initio simulations, such as those performed with Quantum ESPRESSO (QE), play a central role in materials science but are often limited by their high computational cost. Predicting the execution time of self-consistent field (SCF) iterations is particularly challenging, as performance depends on both the physical characteristics of the simulated system and the parallelization parameters of the underlying hardware. This thesis investigates the use of machine learning (ML) techniques to predict the time required per SCF iteration directly from QE inputs, pseudopotentials, and computational settings. A complete workflow was designed to process raw benchmarking data into structured datasets, evaluate multiple regression approaches, and integrate the trained models into a web-based prediction-serving system. Among the tested models, Random Forests achieved the highest overall predictive accuracy and interpretability, revealing that the number of Kohn–Sham states, total cores, and electrons are the most influential factors affecting runtime. Fully Connected Neural Networks showed comparable performance and offered smooth, consistent predictions across a wide range of execution times. Simpler models, such as Kernel Ridge Regression and Linear Regression, provided useful baselines for comparison but were less effective in capturing nonlinear dependencies. Beyond model evaluation, a practical web interface was developed to make runtime prediction accessible to users in real time. By uploading QE input files and specifying hardware configurations, users can obtain immediate predictions of computational cost, supporting more informed resource allocation and efficient planning of large-scale simulations. Overall, this work demonstrates how data-driven approaches can complement traditional performance modeling in high-performance computing. By combining interpretability, predictive accuracy, and real-world deployment, the developed system represents a step toward intelligent, ML-assisted simulation workflows.

Design and Implementation of a Prediction-Serving System for Runtime and Parallel Performance in Quantum ESPRESSO / Safari, M.. - (2025 Dec 16).

Design and Implementation of a Prediction-Serving System for Runtime and Parallel Performance in Quantum ESPRESSO

SAFARI, MANDANA

2025-12-16

Abstract

Ab initio simulations, such as those performed with Quantum ESPRESSO (QE), play a central role in materials science but are often limited by their high computational cost. Predicting the execution time of self-consistent field (SCF) iterations is particularly challenging, as performance depends on both the physical characteristics of the simulated system and the parallelization parameters of the underlying hardware. This thesis investigates the use of machine learning (ML) techniques to predict the time required per SCF iteration directly from QE inputs, pseudopotentials, and computational settings. A complete workflow was designed to process raw benchmarking data into structured datasets, evaluate multiple regression approaches, and integrate the trained models into a web-based prediction-serving system. Among the tested models, Random Forests achieved the highest overall predictive accuracy and interpretability, revealing that the number of Kohn–Sham states, total cores, and electrons are the most influential factors affecting runtime. Fully Connected Neural Networks showed comparable performance and offered smooth, consistent predictions across a wide range of execution times. Simpler models, such as Kernel Ridge Regression and Linear Regression, provided useful baselines for comparison but were less effective in capturing nonlinear dependencies. Beyond model evaluation, a practical web interface was developed to make runtime prediction accessible to users in real time. By uploading QE input files and specifying hardware configurations, users can obtain immediate predictions of computational cost, supporting more informed resource allocation and efficient planning of large-scale simulations. Overall, this work demonstrates how data-driven approaches can complement traditional performance modeling in high-performance computing. By combining interpretability, predictive accuracy, and real-world deployment, the developed system represents a step toward intelligent, ML-assisted simulation workflows.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				16-dic-2025
			
	Aree SISSA
	
				Non assegn
			
	Relatore/i afferenti alla SISSA
	
				Affinito, Fabio
Baroni, Stefano
de Gironcoli, Stefano Maria
Delugas, Pietro Davide
			
	Relatore/i esterni
	
				Bonfà, Pietro
			
	Appare nelle tipologie:
	
				8.4 Master thesis in High Performance Computing (HPC)

File in questo prodotto:

File	Dimensione	Formato
thesis_Safari.pdf accesso aperto Descrizione: Master in HPC Tipologia: Tesi Licenza: Non specificato Dimensione 3.7 MB Formato Adobe PDF Visualizza/Apri	3.7 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/149950

Citazioni

ND

ND

ND

social impact