SISSA DIGITAL LIBRARYInstitutional Research Information System (Statistiche: prodotti, OA)
Per informazioni contatta sdl@sissa.it

The uncanny ability of over-parameterised neural networks to generalise well has been explained using various "simplicity biases". These theories postulate that neural networks avoid overfitting by first fitting simple, linear classifiers before learning more complex, non-linear functions. Meanwhile, data structure is also recognised as a key ingredient for good generalisation, yet its role in simplicity biases is not yet understood. Here, we show that neural networks trained using stochastic gradient descent initially classify their inputs using lower-order input statistics, like mean and covariance, and exploit higher-order statistics only later during training. We first demonstrate this distributional simplicity bias (DSB) in a solvable model of a single neuron trained on synthetic data. We then demonstrate DSB empirically in a range of deep convolutional networks and visual transformers trained on CIFAR10, and show that it even holds in networks pre-trained on ImageNet. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of Gaussian universality in learning.

Neural networks trained with SGD learn distributions of increasing complexity / Refinetti, Maria; Ingrosso, Alessandro; Goldt, Sebastian. - 202:(2023), pp. 28843-28863. (Intervento presentato al convegno International Conference on Machine Learning tenutosi a Honolulu, Hawaii, USA nel 23-29 July 2023) [10.1088/1742-5468/ad8bb8].

Neural networks trained with SGD learn distributions of increasing complexity

Refinetti, Maria;Ingrosso, Alessandro;Goldt, Sebastian

2023-01-01

Abstract

The uncanny ability of over-parameterised neural networks to generalise well has been explained using various "simplicity biases". These theories postulate that neural networks avoid overfitting by first fitting simple, linear classifiers before learning more complex, non-linear functions. Meanwhile, data structure is also recognised as a key ingredient for good generalisation, yet its role in simplicity biases is not yet understood. Here, we show that neural networks trained using stochastic gradient descent initially classify their inputs using lower-order input statistics, like mean and covariance, and exploit higher-order statistics only later during training. We first demonstrate this distributional simplicity bias (DSB) in a solvable model of a single neuron trained on synthetic data. We then demonstrate DSB empirically in a range of deep convolutional networks and visual transformers trained on CIFAR10, and show that it even holds in networks pre-trained on ImageNet. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of Gaussian universality in learning.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Titolo del volume
	
				Proceedings of the 40th International Conference on Machine Learning
			
	Numero del volume
	
				202
			
	Da pagina
	
				28843
			
	A pagina
	
				28863
			
	Codice DOI
	
				https://dx.doi.org/10.1088/1742-5468/ad8bb8
			
	URL
	
				https://arxiv.org/abs/2211.11567
			
	Tutti gli autori
	
						Refinetti, Maria; Ingrosso, Alessandro; Goldt, Sebastian
					
	Appare nelle tipologie:
	
				4.1 Contribution in Conference proceedings

File in questo prodotto:

File	Dimensione	Formato
Refinetti_2025_J._Stat._Mech._2025_024001.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 2.24 MB Formato Adobe PDF Visualizza/Apri	2.24 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/137850

Citazioni

ND

13

0

social impact