SISSA DIGITAL LIBRARYInstitutional Research Information System (Statistiche: prodotti, OA)
Per informazioni contatta sdl@sissa.it

The goal of this thesis is to improve our understanding of the internal mechanisms by which deep architectures build meaningful representations and are able to generalize. We focus on the challenge of characterizing the semantic content of the hidden representation with unsupervised learning tools, partially developed by us and described in this thesis which allow harnessing the low-dimensional structure of the data. Indeed, real-world data are typically hosted in manifolds that can be topologically complex, but that are typically low-dimensional. Chapter 2 introduces Gride, a method that allows estimating the intrinsic dimension of the data as an explicit function of the scale without performing any decimation of the data set. Our method is simple and computationally efficient since it relies only on the distances among data points. In chapter 3 we study the evolution of the probability density across the hidden layers in some state-of-the-art deep neural networks. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant to classification. In subsequent layers, density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. This process leaves a footprint in the probability density of the output layer where the topography of the peaks allows reconstructing the semantic relationships of the categories. In chapter 4 we then study the problem of generalization in deep neural networks: adding parameters to a network that interpolates its training data will typically improve its generalization performance, at odds with the classical bias-variance trade-off. We show that over-parametrized neural networks learn redundant representations instead of overfitting to spurious correlation and that redundant neurons appear only if the network is regularized and the training error is zero.

An unsupervised tour through the hidden pathways of deep neural networks / Doimo, Diego. - (2022 Dec 15).

An unsupervised tour through the hidden pathways of deep neural networks

Doimo, Diego

2022-12-15

Abstract

The goal of this thesis is to improve our understanding of the internal mechanisms by which deep architectures build meaningful representations and are able to generalize. We focus on the challenge of characterizing the semantic content of the hidden representation with unsupervised learning tools, partially developed by us and described in this thesis which allow harnessing the low-dimensional structure of the data. Indeed, real-world data are typically hosted in manifolds that can be topologically complex, but that are typically low-dimensional. Chapter 2 introduces Gride, a method that allows estimating the intrinsic dimension of the data as an explicit function of the scale without performing any decimation of the data set. Our method is simple and computationally efficient since it relies only on the distances among data points. In chapter 3 we study the evolution of the probability density across the hidden layers in some state-of-the-art deep neural networks. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant to classification. In subsequent layers, density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. This process leaves a footprint in the probability density of the output layer where the topography of the peaks allows reconstructing the semantic relationships of the categories. In chapter 4 we then study the problem of generalization in deep neural networks: adding parameters to a network that interpolates its training data will typically improve its generalization performance, at odds with the classical bias-variance trade-off. We show that over-parametrized neural networks learn redundant representations instead of overfitting to spurious correlation and that redundant neurons appear only if the network is regularized and the training error is zero.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				15-dic-2022
			
	Relatore/i afferenti alla SISSA
	
				Laio, Alessandro
			
	Relatore/i esterni
	
				Wyart, Matthieu
Saxe, Andrew
Cazzaniga, Alberto
Allegra, Michele
			
	Tutti gli autori
	
						Doimo, Diego
					
	Appare nelle tipologie:
	
				8.1 PhD thesis

File in questo prodotto:

File	Dimensione	Formato
Tesi Doimo.pdf embargo fino al 15/12/2025 Descrizione: tesi di Ph.D. Tipologia: Tesi Licenza: Non specificato Dimensione 5.84 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	5.84 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/130550

Citazioni

ND

ND

ND

social impact