This study aimed to optimize the computational efficiency of the automated protocol for detecting angular jumps in water molecules. The algorithm, which analyses dipole moment (DP), hydrogen-hydrogen (HH) vectors, and topological defects in hydrogen bonding across systems of varying sizes, faced performance bottlenecks as system size, trajectory length, and simulation time increased. These challenges led to frequent crashes and prolonged execution times. Initial profiling of the DP and HH extraction tool on the Leonardo Supercomputer, revealed high memory consumption (117 GB) and long execution times (4397 seconds). To address these challenges, the MDAnalysis package was initially leveraged for input processing and benchmarked against the custom input processing approach. Results showed that converting XYZ trajectory files to binary reduces memory usage and processing time. Among various binary file reading techniques implemented, using NumPy’s frombuffer function performed best, reducing execution time and memory usage. A speedup by up to 265-fold in time compared to the original method, and 224-fold relative to MDAnalysis was recorded. Adapting optimized this new paradigm of input processing, the serial version of the DP and HH extraction protocol ran approximately 3-fold faster than the original in serial. Implementing multiprocessing for data parallelism with up to 8 processes further improved performance, achieving a 14-fold increase in speed while maintaining low memory usage (≈ 93 MB) for all target analysis tasks. Performance analysis confirmed that parallel execution leading to the most optimal execution was achieved with 5 processing elements, beyond which a saturation was reached. This was due to resource contention during output processing. Quality checks on the molecular swing evaluation protocol were completed, incorporating updates to the jumps calculation protocol and plotting utilities. Additionally, the tool for calculating topological defects in hydrogen-bonded water molecules was enhanced by refactoring and optimized using Numba’s JIT decorator along with the aforementioned coordinate generator. This reduced the serial execution time from 21 hours to 5 hours, representing a 4-fold speed-up. Using Jolib for data parallelism over 32 processes, further reduced execution time from 5 hours (18567.44 seconds) to an average of 697.23 seconds, representing a 25-fold speed-up, with about 6.2 GB maximum memory usage. Overall, this work identified computational bottlenecks presented by the two major tools of the automatized angular jumps detection protocol. By presenting optimized algorithms capable of processing large MD trajectories with minimal memory and time requirements, this work provides a solid foundation for enhancing the efficiency of post-processing after molecular dynamics simulation.

Optimization of postprocessing tools for understanding collective burst mechanism of angular jumps in liquid water(2024 Dec 19).

Optimization of postprocessing tools for understanding collective burst mechanism of angular jumps in liquid water

-
2024-12-19

Abstract

This study aimed to optimize the computational efficiency of the automated protocol for detecting angular jumps in water molecules. The algorithm, which analyses dipole moment (DP), hydrogen-hydrogen (HH) vectors, and topological defects in hydrogen bonding across systems of varying sizes, faced performance bottlenecks as system size, trajectory length, and simulation time increased. These challenges led to frequent crashes and prolonged execution times. Initial profiling of the DP and HH extraction tool on the Leonardo Supercomputer, revealed high memory consumption (117 GB) and long execution times (4397 seconds). To address these challenges, the MDAnalysis package was initially leveraged for input processing and benchmarked against the custom input processing approach. Results showed that converting XYZ trajectory files to binary reduces memory usage and processing time. Among various binary file reading techniques implemented, using NumPy’s frombuffer function performed best, reducing execution time and memory usage. A speedup by up to 265-fold in time compared to the original method, and 224-fold relative to MDAnalysis was recorded. Adapting optimized this new paradigm of input processing, the serial version of the DP and HH extraction protocol ran approximately 3-fold faster than the original in serial. Implementing multiprocessing for data parallelism with up to 8 processes further improved performance, achieving a 14-fold increase in speed while maintaining low memory usage (≈ 93 MB) for all target analysis tasks. Performance analysis confirmed that parallel execution leading to the most optimal execution was achieved with 5 processing elements, beyond which a saturation was reached. This was due to resource contention during output processing. Quality checks on the molecular swing evaluation protocol were completed, incorporating updates to the jumps calculation protocol and plotting utilities. Additionally, the tool for calculating topological defects in hydrogen-bonded water molecules was enhanced by refactoring and optimized using Numba’s JIT decorator along with the aforementioned coordinate generator. This reduced the serial execution time from 21 hours to 5 hours, representing a 4-fold speed-up. Using Jolib for data parallelism over 32 processes, further reduced execution time from 5 hours (18567.44 seconds) to an average of 697.23 seconds, representing a 25-fold speed-up, with about 6.2 GB maximum memory usage. Overall, this work identified computational bottlenecks presented by the two major tools of the automatized angular jumps detection protocol. By presenting optimized algorithms capable of processing large MD trajectories with minimal memory and time requirements, this work provides a solid foundation for enhancing the efficiency of post-processing after molecular dynamics simulation.
19-dic-2024
Laboratorio Interdisciplinare
Girotto, Ivan; Manko, Natalia; Hassanali, Ali
File in questo prodotto:
File Dimensione Formato  
thesis_Ntim_gasu.pdf

accesso aperto

Tipologia: Tesi
Licenza: Non specificato
Dimensione 7.74 MB
Formato Adobe PDF
7.74 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/145290
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact