SISSA DIGITAL LIBRARYInstitutional Research Information System (Statistiche: prodotti, OA)
Per informazioni contatta sdl@sissa.it

Understanding the reasons for the success of deep neural networks trained using stochastic gradient-based methods is a key open problem for the nascent theory of deep learning. The types of data where these networks are most successful, such as images or sequences of speech, are characterized by intricate correlations. Yet, most theoretical work on neural networks does not explicitly model training data or assumes that elements of each data sample are drawn independently from some factorized probability distribution. These approaches are, thus, by construction blind to the correlation structure of real-world datasets and their impact on learning in neural networks. Here, we introduce a generative model for structured datasets that we call the hidden manifold model. The idea is to construct high-dimensional inputs that lie on a lower-dimensional manifold, with labels that depend only on their position within this manifold, akin to a single-layer decoder or generator in a generative adversarial network. We demonstrate that learning of the hidden manifold model is amenable to an analytical treatment by proving a "Gaussian equivalence property"(GEP), and we use the GEP to show how the dynamics of two-layer neural networks trained using one-pass stochastic gradient descent is captured by a set of integro-differential equations that track the performance of the network at all times. This approach permits us to analyze in detail how a neural network learns functions of increasing complexity during training, how its performance depends on its size, and how it is impacted by parameters such as the learning rate or the dimension of the hidden manifold.

Modeling the Influence of Data Structure on Learning in Neural Networks: The Hidden Manifold Model / Goldt, S.; Mezard, M.; Krzakala, F.; Zdeborova, L.. - In: PHYSICAL REVIEW. X. - ISSN 2160-3308. - 10:4(2020), pp. 1-32. [10.1103/PhysRevX.10.041044]

Modeling the Influence of Data Structure on Learning in Neural Networks: The Hidden Manifold Model

Goldt, S.;Mezard, M.;Krzakala, F.;Zdeborova, L.

2020-01-01

Abstract

Understanding the reasons for the success of deep neural networks trained using stochastic gradient-based methods is a key open problem for the nascent theory of deep learning. The types of data where these networks are most successful, such as images or sequences of speech, are characterized by intricate correlations. Yet, most theoretical work on neural networks does not explicitly model training data or assumes that elements of each data sample are drawn independently from some factorized probability distribution. These approaches are, thus, by construction blind to the correlation structure of real-world datasets and their impact on learning in neural networks. Here, we introduce a generative model for structured datasets that we call the hidden manifold model. The idea is to construct high-dimensional inputs that lie on a lower-dimensional manifold, with labels that depend only on their position within this manifold, akin to a single-layer decoder or generator in a generative adversarial network. We demonstrate that learning of the hidden manifold model is amenable to an analytical treatment by proving a "Gaussian equivalence property"(GEP), and we use the GEP to show how the dynamics of two-layer neural networks trained using one-pass stochastic gradient descent is captured by a set of integro-differential equations that track the performance of the network at all times. This approach permits us to analyze in detail how a neural network learns functions of increasing complexity during training, how its performance depends on its size, and how it is impacted by parameters such as the learning rate or the dimension of the hidden manifold.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Rivista
	
				PHYSICAL REVIEW. X
			
	Numero del volume
	
				10
			
	Fascicolo
	
				4
			
	Da pagina
	
				1
			
	A pagina
	
				32
			
	Numero di articolo
	
				0410044
			
	Codice DOI
	
				https://dx.doi.org/10.1103/PhysRevX.10.041044
			
	Fulltext via DOI
	
				https://doi.org/10.1103/PhysRevX.10.041044
			
	URL
	
				https://arxiv.org/abs/1909.11500
			
	Tutti gli autori
	
						Goldt, S.; Mezard, M.; Krzakala, F.; Zdeborova, L.
					
	Appare nelle tipologie:
	
				1.1 Journal article

File in questo prodotto:

File	Dimensione	Formato
PhysRevX.10.041044.pdf accesso aperto Descrizione: DOAJ Rivista OA Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 1.79 MB Formato Adobe PDF Visualizza/Apri	1.79 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/117821

Citazioni

ND

117

99

social impact