We probe the accuracy of linear ridge regression employing a three-body local density representation derived from the atomic cluster expansion. We benchmark the accuracy of this framework in the prediction of formation energies and atomic forces in molecules and solids. We find that such a simple regression framework performs on par with state-of-the-art machine learning methods which are, in most cases, more complex and more computationally demanding. Subsequently, we look for ways to sparsify the descriptor and further improve the computational efficiency of the method. To this aim, we use both principal component analysis and least absolute shrinkage operator regression for energy fitting on six single-element datasets. Both methods highlight the possibility of constructing a descriptor that is four times smaller than the original with a similar or even improved accuracy. Furthermore, we find that the reduced descriptors share a sizable fraction of their features across the six independent datasets, hinting at the possibility of designing material-agnostic, optimally compressed, and accurate descriptors.

Compact atomic descriptors enable accurate predictions via linear models / Zeni, C.; Rossi, K.; Glielmo, A.; De Gironcoli, S.. - In: THE JOURNAL OF CHEMICAL PHYSICS. - ISSN 0021-9606. - 154:22(2021), pp. 1-9. [10.1063/5.0052961]

Compact atomic descriptors enable accurate predictions via linear models

Zeni C.
;
Glielmo A.;De Gironcoli S.
2021-01-01

Abstract

We probe the accuracy of linear ridge regression employing a three-body local density representation derived from the atomic cluster expansion. We benchmark the accuracy of this framework in the prediction of formation energies and atomic forces in molecules and solids. We find that such a simple regression framework performs on par with state-of-the-art machine learning methods which are, in most cases, more complex and more computationally demanding. Subsequently, we look for ways to sparsify the descriptor and further improve the computational efficiency of the method. To this aim, we use both principal component analysis and least absolute shrinkage operator regression for energy fitting on six single-element datasets. Both methods highlight the possibility of constructing a descriptor that is four times smaller than the original with a similar or even improved accuracy. Furthermore, we find that the reduced descriptors share a sizable fraction of their features across the six independent datasets, hinting at the possibility of designing material-agnostic, optimally compressed, and accurate descriptors.
2021
154
22
1
9
224112
10.1063/5.0052961
https://arxiv.org/abs/2105.11231
Zeni, C.; Rossi, K.; Glielmo, A.; De Gironcoli, S.
File in questo prodotto:
File Dimensione Formato  
JCP-154-224112-2021.pdf

Open Access dal 15/06/2022

Descrizione: articolo versione editoriale
Tipologia: Versione Editoriale (PDF)
Licenza: Non specificato
Dimensione 4.83 MB
Formato Adobe PDF
4.83 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/126473
Citazioni
  • ???jsp.display-item.citation.pmc??? 2
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 16
social impact