We propose a new scientific application of unsupervised learning techniques to boost our ability to search for new phenomena in data, by detecting discrepancies between two datasets. These could be, for example, a simulated standard-model background, and an observed dataset containing a potential hidden signal of New Physics. We build a statistical test upon a test statistic which measures deviations between two samples, using a Nearest Neighbors approach to estimate the local ratio of the density of points. The test is model-independent and non-parametric, requiring no knowledge of the shape of the underlying distributions, and it does not bin the data, thus retaining full information from the multidimensional feature space. As a proof-of-concept, we apply our method to synthetic Gaussian data, and to a simulated dark matter signal at the Large Hadron Collider. Even in the case where the background can not be simulated accurately enough to claim discovery, the technique is a powerful tool to identify regions of interest for further study.

Guiding new physics searches with unsupervised learning / De Simone, A.; Jacques, T.. - In: THE EUROPEAN PHYSICAL JOURNAL. C, PARTICLES AND FIELDS. - ISSN 1434-6044. - 79:4(2019), pp. 1-15. [10.1140/epjc/s10052-019-6787-3]

Guiding new physics searches with unsupervised learning

De Simone A.
;
Jacques T.
2019-01-01

Abstract

We propose a new scientific application of unsupervised learning techniques to boost our ability to search for new phenomena in data, by detecting discrepancies between two datasets. These could be, for example, a simulated standard-model background, and an observed dataset containing a potential hidden signal of New Physics. We build a statistical test upon a test statistic which measures deviations between two samples, using a Nearest Neighbors approach to estimate the local ratio of the density of points. The test is model-independent and non-parametric, requiring no knowledge of the shape of the underlying distributions, and it does not bin the data, thus retaining full information from the multidimensional feature space. As a proof-of-concept, we apply our method to synthetic Gaussian data, and to a simulated dark matter signal at the Large Hadron Collider. Even in the case where the background can not be simulated accurately enough to claim discovery, the technique is a powerful tool to identify regions of interest for further study.
2019
79
4
1
15
289
http://link.springer-ny.com/link/service/journals/10052/index.htm
De Simone, A.; Jacques, T.
File in questo prodotto:
File Dimensione Formato  
2019_NN2ST.pdf

accesso aperto

Descrizione: Article funded by SCOAP3
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 1.16 MB
Formato Adobe PDF
1.16 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/92273
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 87
  • ???jsp.display-item.citation.isi??? 83
social impact