SISSA DIGITAL LIBRARYInstitutional Research Information System (Statistiche: prodotti, OA)
Per informazioni contatta sdl@sissa.it

Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trained on data drawn from pre-trained generative models. This is possible due to a Gaussian equivalence stating that the key metrics of interest, such as the training and test errors, can be fully captured by an appropriately chosen Gaussian model. We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence. First, we establish rigorous conditions for the Gaussian equivalence to hold in the case of single-layer generative models, as well as deterministic rates for convergence in distribution. Second, we leverage this equivalence to derive a closed set of equations describing the generalisation performance of two widely studied machine learning problems: two-layer neural networks trained using one-pass stochastic gradient descent, and full-batch pre-learned features or kernel methods. Finally, we perform experiments demonstrating how our theory applies to deep, pre-trained generative models. These results open a viable path to the theoretical study of machine learning models with realistic data.

The Gaussian equivalence of generative models for learning with shallow neural networks / Goldt, S.; Loureiro, B.; Reeves, G.; Krzakala, F.; Mezard, M.; Zdeborova, L.. - 145:(2021), pp. 426-471. ( 2nd Mathematical and Scientific Machine Learning Conference, MSML 202116-19 August 2021).

The Gaussian equivalence of generative models for learning with shallow neural networks

Goldt S.;Loureiro B.;Reeves G.;Krzakala F.;Mezard M.;Zdeborova L.

2021-01-01

Abstract

Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trained on data drawn from pre-trained generative models. This is possible due to a Gaussian equivalence stating that the key metrics of interest, such as the training and test errors, can be fully captured by an appropriately chosen Gaussian model. We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence. First, we establish rigorous conditions for the Gaussian equivalence to hold in the case of single-layer generative models, as well as deterministic rates for convergence in distribution. Second, we leverage this equivalence to derive a closed set of equations describing the generalisation performance of two widely studied machine learning problems: two-layer neural networks trained using one-pass stochastic gradient descent, and full-batch pre-learned features or kernel methods. Finally, we perform experiments demonstrating how our theory applies to deep, pre-trained generative models. These results open a viable path to the theoretical study of machine learning models with realistic data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Titolo del volume
	
				Proceedings of Machine Learning Research
			
	Serie
	
				PROCEEDINGS OF MACHINE LEARNING RESEARCH
			
	Numero del volume
	
				145
			
	Da pagina
	
				426
			
	A pagina
	
				471
			
	URL
	
				https://arxiv.org/abs/2006.14709
			
	Nome editore
	
				ML Research Press
			
	Tutti gli autori
	
						Goldt, S.; Loureiro, B.; Reeves, G.; Krzakala, F.; Mezard, M.; Zdeborova, L.
					
	Appare nelle tipologie:
	
				4.1 Contribution in Conference proceedings

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/135772

Citazioni

ND

91

22

social impact