The goal of this thesis is to improve our understanding of the internal mechanisms by which deep architectures build meaningful representations and are able to generalize. We focus on the challenge of characterizing the semantic content of the hidden representation with unsupervised learning tools, partially developed by us and described in this thesis which allow harnessing the low-dimensional structure of the data. Indeed, real-world data are typically hosted in manifolds that can be topologically complex, but that are typically low-dimensional. Chapter 2 introduces Gride, a method that allows estimating the intrinsic dimension of the data as an explicit function of the scale without performing any decimation of the data set. Our method is simple and computationally efficient since it relies only on the distances among data points. In chapter 3 we study the evolution of the probability density across the hidden layers in some state-of-the-art deep neural networks. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant to classification. In subsequent layers, density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. This process leaves a footprint in the probability density of the output layer where the topography of the peaks allows reconstructing the semantic relationships of the categories. In chapter 4 we then study the problem of generalization in deep neural networks: adding parameters to a network that interpolates its training data will typically improve its generalization performance, at odds with the classical bias-variance trade-off. We show that over-parametrized neural networks learn redundant representations instead of overfitting to spurious correlation and that redundant neurons appear only if the network is regularized and the training error is zero.

An unsupervised tour through the hidden pathways of deep neural networks / Doimo, Diego. - (2022 Dec 15).

An unsupervised tour through the hidden pathways of deep neural networks

Doimo, Diego
2022-12-15

Abstract

The goal of this thesis is to improve our understanding of the internal mechanisms by which deep architectures build meaningful representations and are able to generalize. We focus on the challenge of characterizing the semantic content of the hidden representation with unsupervised learning tools, partially developed by us and described in this thesis which allow harnessing the low-dimensional structure of the data. Indeed, real-world data are typically hosted in manifolds that can be topologically complex, but that are typically low-dimensional. Chapter 2 introduces Gride, a method that allows estimating the intrinsic dimension of the data as an explicit function of the scale without performing any decimation of the data set. Our method is simple and computationally efficient since it relies only on the distances among data points. In chapter 3 we study the evolution of the probability density across the hidden layers in some state-of-the-art deep neural networks. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant to classification. In subsequent layers, density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. This process leaves a footprint in the probability density of the output layer where the topography of the peaks allows reconstructing the semantic relationships of the categories. In chapter 4 we then study the problem of generalization in deep neural networks: adding parameters to a network that interpolates its training data will typically improve its generalization performance, at odds with the classical bias-variance trade-off. We show that over-parametrized neural networks learn redundant representations instead of overfitting to spurious correlation and that redundant neurons appear only if the network is regularized and the training error is zero.
15-dic-2022
Laio, Alessandro
Wyart, Matthieu Saxe, Andrew Cazzaniga, Alberto Allegra, Michele
Doimo, Diego
File in questo prodotto:
File Dimensione Formato  
Tesi Doimo.pdf

embargo fino al 15/12/2025

Descrizione: tesi di Ph.D.
Tipologia: Tesi
Licenza: Non specificato
Dimensione 5.84 MB
Formato Adobe PDF
5.84 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/130550
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact