Hi-C matrices describe the genome-wide contact probability between chromatin loci. The comparison of Hi-C matrices is important both to assess the reproducibility in biological replicates and to find significant differences between non replicates from different cell-types; however this analysis faces two challenges: Hi-C matrices tend to be undersampled, and thus noisy, and they contain a variety of multi-scale interactions patterns that must be taken into account. One solution to tackle these problems is to extract information from the spectral features of Hi-C maps. In this thesis I will show, by comparing Hi-C maps to random matrices, that most of their spectrum is "aspecific", meaning that its features are the same in all Hi-C maps. On the other hand the top eigenspaces present highly non random features: by enucleating them from the full matrix I am able to obtain sharper interaction patterns, effectively enhancing the quality at the single matrix level and improving results in classification tasks. This shows that selecting a small number of degrees of freedom is key to augment the signal present in Hi-C matrices. However spectral methods are not the only way of reducing the dimensionality of Hi-C datasets: in the second part of the thesis I propose a variational autoencoder architecture as a way of compressing Hi-C data and identifying the most relevant degrees of freedom. Local interactions patterns in Hi-C maps repeat in different cell-types and chromosomes. By learning a low dimensional representation of these local patterns, the variational autoencoder can be used to compress and decompress any Hi-C map. I will show that the reconstruction quality is better than what can be obtained by linear methods, and that classification tasks improve when applied to the low dimensional representations of Hi-C maps. Finally, I will show that the action of the autoencoder and the spectral filter described in the first part of the thesis on the spectra of Hi-C maps is similar.

Spectral and deep learning approaches to Hi-C data analysis / Franzini, Stefano. - (2021 Oct 25).

Spectral and deep learning approaches to Hi-C data analysis

Franzini, Stefano
2021-10-25

Abstract

Hi-C matrices describe the genome-wide contact probability between chromatin loci. The comparison of Hi-C matrices is important both to assess the reproducibility in biological replicates and to find significant differences between non replicates from different cell-types; however this analysis faces two challenges: Hi-C matrices tend to be undersampled, and thus noisy, and they contain a variety of multi-scale interactions patterns that must be taken into account. One solution to tackle these problems is to extract information from the spectral features of Hi-C maps. In this thesis I will show, by comparing Hi-C maps to random matrices, that most of their spectrum is "aspecific", meaning that its features are the same in all Hi-C maps. On the other hand the top eigenspaces present highly non random features: by enucleating them from the full matrix I am able to obtain sharper interaction patterns, effectively enhancing the quality at the single matrix level and improving results in classification tasks. This shows that selecting a small number of degrees of freedom is key to augment the signal present in Hi-C matrices. However spectral methods are not the only way of reducing the dimensionality of Hi-C datasets: in the second part of the thesis I propose a variational autoencoder architecture as a way of compressing Hi-C data and identifying the most relevant degrees of freedom. Local interactions patterns in Hi-C maps repeat in different cell-types and chromosomes. By learning a low dimensional representation of these local patterns, the variational autoencoder can be used to compress and decompress any Hi-C map. I will show that the reconstruction quality is better than what can be obtained by linear methods, and that classification tasks improve when applied to the low dimensional representations of Hi-C maps. Finally, I will show that the action of the autoencoder and the spectral filter described in the first part of the thesis on the spectra of Hi-C maps is similar.
25-ott-2021
Micheletti, Cristian
Franzini, Stefano
File in questo prodotto:
File Dimensione Formato  
phd_thesis_Franzini.pdf

accesso aperto

Descrizione: Doctoral Thesis
Tipologia: Documento in Pre-print
Licenza: Non specificato
Dimensione 21.29 MB
Formato Adobe PDF
21.29 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/125029
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact