SISSA DIGITAL LIBRARYInstitutional Research Information System (Statistiche: prodotti, OA)
Per informazioni contatta sdl@sissa.it

We propose a new scientific application of unsupervised learning techniques to boost our ability to search for new phenomena in data, by detecting discrepancies between two datasets. These could be, for example, a simulated standard-model background, and an observed dataset containing a potential hidden signal of New Physics. We build a statistical test upon a test statistic which measures deviations between two samples, using a Nearest Neighbors approach to estimate the local ratio of the density of points. The test is model-independent and non-parametric, requiring no knowledge of the shape of the underlying distributions, and it does not bin the data, thus retaining full information from the multidimensional feature space. As a proof-of-concept, we apply our method to synthetic Gaussian data, and to a simulated dark matter signal at the Large Hadron Collider. Even in the case where the background can not be simulated accurately enough to claim discovery, the technique is a powerful tool to identify regions of interest for further study.

Guiding new physics searches with unsupervised learning / De Simone, A.; Jacques, T.. - In: THE EUROPEAN PHYSICAL JOURNAL. C, PARTICLES AND FIELDS. - ISSN 1434-6044. - 79:4(2019), pp. 1-15. [10.1140/epjc/s10052-019-6787-3]

Guiding new physics searches with unsupervised learning

De Simone A.;Jacques T.

2019-01-01

Abstract

We propose a new scientific application of unsupervised learning techniques to boost our ability to search for new phenomena in data, by detecting discrepancies between two datasets. These could be, for example, a simulated standard-model background, and an observed dataset containing a potential hidden signal of New Physics. We build a statistical test upon a test statistic which measures deviations between two samples, using a Nearest Neighbors approach to estimate the local ratio of the density of points. The test is model-independent and non-parametric, requiring no knowledge of the shape of the underlying distributions, and it does not bin the data, thus retaining full information from the multidimensional feature space. As a proof-of-concept, we apply our method to synthetic Gaussian data, and to a simulated dark matter signal at the Large Hadron Collider. Even in the case where the background can not be simulated accurately enough to claim discovery, the technique is a powerful tool to identify regions of interest for further study.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Rivista
	
				THE EUROPEAN PHYSICAL JOURNAL. C, PARTICLES AND FIELDS
			
	Numero del volume
	
				79
			
	Fascicolo
	
				4
			
	Da pagina
	
				1
			
	A pagina
	
				15
			
	Numero di articolo
	
				289
			
	Codice DOI
	
				https://dx.doi.org/10.1140/epjc/s10052-019-6787-3
			
	URL
	
				http://link.springer-ny.com/link/service/journals/10052/index.htm
			
	Tutti gli autori
	
						De Simone, A.; Jacques, T.
					
	Appare nelle tipologie:
	
				1.1 Journal article

File in questo prodotto:

File	Dimensione	Formato
2019_NN2ST.pdf accesso aperto Descrizione: Article funded by SCOAP3 Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 1.16 MB Formato Adobe PDF Visualizza/Apri	1.16 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/92273

Citazioni

ND

90

85

social impact