One of the most intriguing discoveries in the recent decades is that “the genome is a work in progress”, constantly gaining and loosing chunks of sequence, in order to provide new potentially favorable combinations for adaptation. The old genetic concept that the genome is static has prevailed until the 1950s, when it was first suggested that there is a lot more to DNA than just genes. Indeed, genetic material is dynamic and the greatest part of most organisms’ genome is occupied by non-coding DNA, especially DNA fragments deriving from elements capable of moving to new locations: Transposable Elements (TEs). TEs are mobile DNA fragments, whose remnants occupy nearly half of mammalian genome and up to 90% of the genome of some plants (SanMiguel et al., 1996). Since 1951, when Barbara McClintock discovered them in maize (McClintock, 1951), extensive efforts have been devoted to understand the function of these interspersed repeats. Unfortunately, due to their hidden activity, TEs have been largely underappreciated and dismissed as ‘junk DNA’. When researchers identified long interspersed element-1 (LINE-1 or L1) insertions to be responsible for haemophilia A, in 1988 (Kazazian et al., 1988), TEs gained new attention. LINE-1 elements are the only active, autonomous TE present in the mammalian genome. These molecules, able to create polymorphisms among individuals and genomic mosaicism among populations of cells, are major sources of Structural Variations (SVs) in humans and are responsible for 124 genetic diseases (Hancks and Kazazian, 2016). In particular, the discovery of LINE- 1 mobilization in neurogenesis (Muotri et al., 2005, Coufal et al., 2009) urged the scientific community to investigate the potential involvement of mobile elements in neuropsychiatric disorders (Bundo et al., 2014 , Guffanti et al., 2016, Shpyleva et al., 2017 ) and neurodegenerative diseases (Li et al., 2012). Nowadays, LINE-1 activity has been proven in vitro (Moran et al., 1996) and in vivo (Ostertag et al., 2002) while the real rate of retrotransposition remains an open question. One of the main reasons for this lack of knowledge is the absence of reliable methods to detect elements present in a small minority of cells, or unique to a single cell. This is exacerbated by the technical complexity of deconstructing non-reference, chimeric regions of the genomes through experimental or computational means. Until very recently, assays using ligation-mediated PCR techniques have been considered the gold standard for proving and quantifying current retrotransposon activity. vi Unfortunately, both positive and negative changes in the number of repeats detected with these techniques can occur by a multitude of mechanisms not directly related to retrotransposition. Among the most common retrotransposition-independent rearrangements there are non-homologous recombination-mediated deletions and duplications. In this thesis, I focus on the effects of LINE-1 elements on genome stability. To this purpose, I describe three different bioinformatics methods for the study of the hallmarks of LINE-1-mediated genome instability: direct insertion, post-insertional rearrangements and Double Strand Breaks (DSBs). The increasing availability of large amounts of sequencing data produced by Next- generation sequencing (NGS) calls for the development of new genomics technologies and bioinformatics pipelines targeted to study retrotransposons, to fully exploit the available resources. Therefore a scalable approach, such as the Splinkerette Analysis of Mobile Elements (SPAM) method proposed here, is of substantial interest to assist the current and future developments in the study of TEs. Importantly, SPAM allowed us to target exclusively Full-Length LINE-1 elements (FL-L1) present in Frontal Cortex (FC) and Kidney (K) of Alzheimer’s Disease (AD) and controls (CTRL) post-mortem tissues and to test whether LINE-1 polymorphisms can be a relevant source of SVs associated to AD risks. This is accomplished combining a PCR-based enrichment of FL-L1 elements with an ad hoc bioinformatic pipeline. The performance of our integrative method is achieved for its ability to detect LINE-1 insertion sites with great precision and for its scalability. Embedded in the methodology is the flexibility to perform the same technique in different organisms and for different classes of TEs. Using SPAM, we observed for the first time an unexpectedly high levels of retrotransposition in the K. In association with the SPAM approach, we performed TaqMan based Copy Number Variation (CNV) analysis to evaluate the content of potentially active L1s in the different tissues of AD and CTRL individuals. Overall, we show that the content of FL-L1 sequences in AD is significantly lower than in CTRL, that de-novo integrations are not associated to the disease but that FL-L1 polymorphisms can be a relevant source of SVs. Then, we investigated which mechanism underlies the regulation of Olfactory Receptor (OR) choice in the mouse Olfactory Epithelium (OE), characterizing Olfr2 locus-specific SVs. To perform this task, we combined whole genome amplification from small number vii of cells with PacBio single molecule sequencing and a complementary high-fidelity paired-end Illumina sequencing. This approach allowed us an accurate identification of breakpoints in a locus where a very high repeat concentration, especially LINE elements, provides more chances for recombination events to occur between retrotransposon fragments. Surprisingly, the analysis revealed hundreds of heterozygous structural variants in the vicinity of the locus, among which deletions are the most abundant. The presence and characteristics of particular genomic features associated with the observed deletions, suggest us that Micro-homology Mediated End Joining (MMEJ) of Double Strand Breaks (DSB) seems to be the main mechanism operating in the formation of deletions. Further experiments will tell us if the observed SVs are involved in the regulation of the expression of ORs. Intrigued by the idea that OR genes can present somatic SVs, we profiled endogenous DSB distribution in the mouse OE at p6 and 1m and in the liver at p6. To this purpose, we performed a Chromatin ImmunoPrecipitation and Sequencing (ChIP-Seq) analysis of γ-H2AX (an early response marker for DNA-DSBs). Little is known about the differential distribution of γ-H2AX throughout the genome at physiological conditions. In the light of our results, γ-H2AX signal is stronger in gene-rich, transcribed regions where it co- localizes with regulatory sites. These results suggest a potential involvement of DBSs in resolving topological stress and promoting interactions between regulatory regions. The research described in this thesis is aimed at enhancing our understanding of the role of LINE-1-mediated SVs in health and disease.
|Titolo:||Detecting LINE-1 mediated structural variants from sequencing data: computational characterization of genomic rearrangements occurring in human post-mortem brains in the pathologic context of Alzheimer’s disease and in mouse olfactory epithelium at physiological conditions|
|Data di pubblicazione:||24-nov-2017|
|Appare nelle tipologie:||8.1 PhD thesis|