Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.

Unsupervised Learning Methods for Molecular Simulation Data / Glielmo, A.; Husic, B. E.; Rodriguez Garcia, A.; Clementi, C.; Noe, F.; Laio, A.. - In: CHEMICAL REVIEWS. - ISSN 0009-2665. - 121:16(2021), pp. 9722-9758. [10.1021/acs.chemrev.0c01195]

Unsupervised Learning Methods for Molecular Simulation Data

Glielmo A.;Rodriguez Garcia A.;Laio A.
2021-01-01

Abstract

Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
2021
121
16
9722
9758
10.1021/acs.chemrev.0c01195
Glielmo, A.; Husic, B. E.; Rodriguez Garcia, A.; Clementi, C.; Noe, F.; Laio, A.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/125811
Citazioni
  • ???jsp.display-item.citation.pmc??? 45
  • Scopus 138
  • ???jsp.display-item.citation.isi??? 126
social impact