Technological advancements, in the form of SNPs arrays and whole genome sequencing provided high-throughput capability in investigating both the quantity and the sequence of cellular DNA. Throughout their extensive applications, several types of variants have been identified to generate mosaicism, the phenomenon characterized by the presence of cells with genetic differences within an organism. In humans, mosaicism has been found to be highly pervasive in both healthy and impaired brains, although its roles and its potential pathological effects are not yet fully understood. Among all the types of somatic variants found to concur in mosaicism, single nucleotide variants and insertions of retrotransposons are of particular interest. Single nucleotide variants (SNVs) are mutations that affect single positions of the genome, and although are predominantly physiological, due to the intrinsic error rate of the DNA replication process [McCulloch and Kunkel, 2008], have been found to cause several brain-related diseases, such as malformations of the brain [Gleeson et al., 2000; Rivière et al., 2012] and severe epileptic brain malformation [Lee et al., 2012; Poduri et al., 2012]. Retrotransposons instead, are a class of repetitive elements which can mobilize within the genome and increase, as a consequence of the process, their copy number. Through this, retrotransposons can shape the human genome by generating structural variants and possibly lead to gene function alterations. Among all the retrotransposons, the LINE-1 family (L1) is the only thought to be still active in humans, and therefore able to concur to mosaicism. In Alzheimer’s disease (AD), which is the most common neurodegenerative disorder characterized by the accumulation of plaques composed of amyloid β, neurofibrillary tangles containing Tau, synaptic loss and neuronal death, mosaicism has been detected. A recent publication demonstrated that a pathogenic SNVs in PIN1 gene, that result in the loss-of-function mutation of the protein, can lead to tau phosphorylation and aggregation, suggesting therefore a possible link between SNVs and the appearance of tau pathology in AD brains [Park et al., 2019]. Moreover, multiple observation linked AD-key proteins (such as Tau and TDP-43) with the reactivation of retrotransposons [Krug et al., 2017; Saleh et al., 2019]. However, the real impact of SNVs and retrotransposons in AD still remain largely unknown. Motivated by this lack of knowledge, I decided to investigate SNVs abundance and retrotransposon copy number (CNV), mainly L1’s, using AD post-mortem tissue samples. Giving the current technological limitations that affect mosaicism detection, the dataset was studied by coupling two different strategies and by developing a new targeted sequencing approach. In order to call the highest number of SNVs with the highest quality possible, the densest SNP array available to date (i.e. with the highest number of different SNP probes) was applied upon cerebellum, frontal cortex and kidney samples that belonged to the same individuals. Despite arrays were also used to assess retrotransposon CNV content, they are ineffective in the detection of new retrotransposition events. For this reason, and to further expand SNVs detection to the whole genome, it was applied, as a second strategy, short-reads high coverage (~100x) whole genome sequencing additionally extending the analyses to temporal cortex and hippocampus tissues. Thanks to this approach, it was also unveiled, for the first time to my knowledge, the presence of multi-nucleotide somatic variants in brain (MNVs), a class of variants characterized by two nearby SNVs within the same haplotype. Finally, although whole genome sequencing strategies were proven to be successful in retrotransposition genotyping, being able to uniquely map short reads originated from repetitive elements to specific genomic regions is currently problematic. Therefore, CNVs, polymorphism and structural variants in overlap with repetitive elements may remain undetected, an aspect that would be even more exacerbated for somatic variants. To improve mapping specificity and resolution, we developed a targeted sequencing approach designed to specifically amplify and sequence both the genomic upstream and the 5’ region of a subset of ~3000 full-length L1. We named this technology LIFE-seq from LINE-1 Five prime End sequencing. I contributed by developed an analysis pipeline able to genotype sequenced loci, testing it upon a subset of the AD dataset.

Investigation of mosaicism sources in Alzheimer’s disease. A focus on nucleotide variants and LINE-1 copy number variants / Leoni, Gabriele. - (2020 Dec 15).

Investigation of mosaicism sources in Alzheimer’s disease. A focus on nucleotide variants and LINE-1 copy number variants.

Leoni, Gabriele
2020-12-15

Abstract

Technological advancements, in the form of SNPs arrays and whole genome sequencing provided high-throughput capability in investigating both the quantity and the sequence of cellular DNA. Throughout their extensive applications, several types of variants have been identified to generate mosaicism, the phenomenon characterized by the presence of cells with genetic differences within an organism. In humans, mosaicism has been found to be highly pervasive in both healthy and impaired brains, although its roles and its potential pathological effects are not yet fully understood. Among all the types of somatic variants found to concur in mosaicism, single nucleotide variants and insertions of retrotransposons are of particular interest. Single nucleotide variants (SNVs) are mutations that affect single positions of the genome, and although are predominantly physiological, due to the intrinsic error rate of the DNA replication process [McCulloch and Kunkel, 2008], have been found to cause several brain-related diseases, such as malformations of the brain [Gleeson et al., 2000; Rivière et al., 2012] and severe epileptic brain malformation [Lee et al., 2012; Poduri et al., 2012]. Retrotransposons instead, are a class of repetitive elements which can mobilize within the genome and increase, as a consequence of the process, their copy number. Through this, retrotransposons can shape the human genome by generating structural variants and possibly lead to gene function alterations. Among all the retrotransposons, the LINE-1 family (L1) is the only thought to be still active in humans, and therefore able to concur to mosaicism. In Alzheimer’s disease (AD), which is the most common neurodegenerative disorder characterized by the accumulation of plaques composed of amyloid β, neurofibrillary tangles containing Tau, synaptic loss and neuronal death, mosaicism has been detected. A recent publication demonstrated that a pathogenic SNVs in PIN1 gene, that result in the loss-of-function mutation of the protein, can lead to tau phosphorylation and aggregation, suggesting therefore a possible link between SNVs and the appearance of tau pathology in AD brains [Park et al., 2019]. Moreover, multiple observation linked AD-key proteins (such as Tau and TDP-43) with the reactivation of retrotransposons [Krug et al., 2017; Saleh et al., 2019]. However, the real impact of SNVs and retrotransposons in AD still remain largely unknown. Motivated by this lack of knowledge, I decided to investigate SNVs abundance and retrotransposon copy number (CNV), mainly L1’s, using AD post-mortem tissue samples. Giving the current technological limitations that affect mosaicism detection, the dataset was studied by coupling two different strategies and by developing a new targeted sequencing approach. In order to call the highest number of SNVs with the highest quality possible, the densest SNP array available to date (i.e. with the highest number of different SNP probes) was applied upon cerebellum, frontal cortex and kidney samples that belonged to the same individuals. Despite arrays were also used to assess retrotransposon CNV content, they are ineffective in the detection of new retrotransposition events. For this reason, and to further expand SNVs detection to the whole genome, it was applied, as a second strategy, short-reads high coverage (~100x) whole genome sequencing additionally extending the analyses to temporal cortex and hippocampus tissues. Thanks to this approach, it was also unveiled, for the first time to my knowledge, the presence of multi-nucleotide somatic variants in brain (MNVs), a class of variants characterized by two nearby SNVs within the same haplotype. Finally, although whole genome sequencing strategies were proven to be successful in retrotransposition genotyping, being able to uniquely map short reads originated from repetitive elements to specific genomic regions is currently problematic. Therefore, CNVs, polymorphism and structural variants in overlap with repetitive elements may remain undetected, an aspect that would be even more exacerbated for somatic variants. To improve mapping specificity and resolution, we developed a targeted sequencing approach designed to specifically amplify and sequence both the genomic upstream and the 5’ region of a subset of ~3000 full-length L1. We named this technology LIFE-seq from LINE-1 Five prime End sequencing. I contributed by developed an analysis pipeline able to genotype sequenced loci, testing it upon a subset of the AD dataset.
15-dic-2020
Sanges, Remo
Gustincich, Stefano
Calogero, Raffaele; Tongiorgi, Enrico
Leoni, Gabriele
File in questo prodotto:
File Dimensione Formato  
PhD.Thesis.Leoni.pdf

Open Access dal 01/12/2022

Descrizione: Tesi di dottorato
Tipologia: Versione Editoriale (PDF)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 14.68 MB
Formato Adobe PDF
14.68 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/115969
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact