Hi-C matrices describe the genome-wide contact probability between chromatin loci. The comparison of Hi-C matrices is important both to assess the reproducibility in biological replicates and to find significant differences between non replicates from different cell-types; however this analysis faces two challenges: Hi-C matrices tend to be undersampled, and thus noisy, and they contain a variety of multi-scale interactions patterns that must be taken into account. One solution to tackle these problems is to extract information from the spectral features of Hi-C maps. In this thesis I will show, by comparing Hi-C maps to random matrices, that most of their spectrum is "aspecific", meaning that its features are the same in all Hi-C maps. On the other hand the top eigenspaces present highly non random features: by enucleating them from the full matrix I am able to obtain sharper interaction patterns, effectively enhancing the quality at the single matrix level and improving results in classification tasks. This shows that selecting a small number of degrees of freedom is key to augment the signal present in Hi-C matrices. However spectral methods are not the only way of reducing the dimensionality of Hi-C datasets: in the second part of the thesis I propose a variational autoencoder architecture as a way of compressing Hi-C data and identifying the most relevant degrees of freedom. Local interactions patterns in Hi-C maps repeat in different cell-types and chromosomes. By learning a low dimensional representation of these local patterns, the variational autoencoder can be used to compress and decompress any Hi-C map. I will show that the reconstruction quality is better than what can be obtained by linear methods, and that classification tasks improve when applied to the low dimensional representations of Hi-C maps. Finally, I will show that the action of the autoencoder and the spectral filter described in the first part of the thesis on the spectra of Hi-C maps is similar.
Spectral and deep learning approaches to Hi-C data analysis / Franzini, Stefano. - (2021 Oct 25).
Spectral and deep learning approaches to Hi-C data analysis
Franzini, Stefano
2021-10-25
Abstract
Hi-C matrices describe the genome-wide contact probability between chromatin loci. The comparison of Hi-C matrices is important both to assess the reproducibility in biological replicates and to find significant differences between non replicates from different cell-types; however this analysis faces two challenges: Hi-C matrices tend to be undersampled, and thus noisy, and they contain a variety of multi-scale interactions patterns that must be taken into account. One solution to tackle these problems is to extract information from the spectral features of Hi-C maps. In this thesis I will show, by comparing Hi-C maps to random matrices, that most of their spectrum is "aspecific", meaning that its features are the same in all Hi-C maps. On the other hand the top eigenspaces present highly non random features: by enucleating them from the full matrix I am able to obtain sharper interaction patterns, effectively enhancing the quality at the single matrix level and improving results in classification tasks. This shows that selecting a small number of degrees of freedom is key to augment the signal present in Hi-C matrices. However spectral methods are not the only way of reducing the dimensionality of Hi-C datasets: in the second part of the thesis I propose a variational autoencoder architecture as a way of compressing Hi-C data and identifying the most relevant degrees of freedom. Local interactions patterns in Hi-C maps repeat in different cell-types and chromosomes. By learning a low dimensional representation of these local patterns, the variational autoencoder can be used to compress and decompress any Hi-C map. I will show that the reconstruction quality is better than what can be obtained by linear methods, and that classification tasks improve when applied to the low dimensional representations of Hi-C maps. Finally, I will show that the action of the autoencoder and the spectral filter described in the first part of the thesis on the spectra of Hi-C maps is similar.File | Dimensione | Formato | |
---|---|---|---|
phd_thesis_Franzini.pdf
accesso aperto
Descrizione: Doctoral Thesis
Tipologia:
Documento in Pre-print
Licenza:
Non specificato
Dimensione
21.29 MB
Formato
Adobe PDF
|
21.29 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.