Most bioinformatic analyses start by building sequence alignments by means of scoring matrices. An implicit approximation on which many scoring matrices are built is that protein sequence evolution is considered a sequence of Point Accepted Mutations (PAM) (Dayhoff et al., 1978), in which each substitution happens independently of the history of the sequence, namely with a probability that depends only on the initial and final amino acids. But different protein sites evolve at a different rate (Echave et al., 2016) and this feature, though included in many phylogenetic reconstruction algorithms, is generally neglected when building or using substitution matrices. Moreover, substitutions at different protein sites are known to be entangled by coevolution (de Juan et al., 2013). This thesis is devoted to the analysis of the consequences of neglecting these effects and to the development of models of protein sequence evolution capable of incorporating them. We introduce a simple procedure that allows including the among-site rate variability in PAM-like scoring matrices through a mean-field-like framework, and we show that rate variability leads to non trivial evolutions when considering whole protein sequences. We also propose a procedure for deriving a substitution rate matrix from Single Nucleotide Polymorphisms (SNPs): we first test the statistical compatibility of frequent genetic variants within a species and substitutions accumulated between species; moreover we show that the matrix built from SNPs faithfully describes substitution rates for short evolutionary times, if rate variability is taken into account. Finally, we present a simple model, inspired by coevolution, capable of predicting at the same time the along-chain correlation of substitutions and the time variability of substitution rates. This model is based on the idea that a mutation at a site enhances the probability of fixing mutations in the other protein sites in its spatial proximity, but only for a certain amount of time.

Towards a deeper understanding of protein sequence evolution / Rizzato, Francesca. - (2016 Oct 20).

Towards a deeper understanding of protein sequence evolution

Rizzato, Francesca
2016-10-20

Abstract

Most bioinformatic analyses start by building sequence alignments by means of scoring matrices. An implicit approximation on which many scoring matrices are built is that protein sequence evolution is considered a sequence of Point Accepted Mutations (PAM) (Dayhoff et al., 1978), in which each substitution happens independently of the history of the sequence, namely with a probability that depends only on the initial and final amino acids. But different protein sites evolve at a different rate (Echave et al., 2016) and this feature, though included in many phylogenetic reconstruction algorithms, is generally neglected when building or using substitution matrices. Moreover, substitutions at different protein sites are known to be entangled by coevolution (de Juan et al., 2013). This thesis is devoted to the analysis of the consequences of neglecting these effects and to the development of models of protein sequence evolution capable of incorporating them. We introduce a simple procedure that allows including the among-site rate variability in PAM-like scoring matrices through a mean-field-like framework, and we show that rate variability leads to non trivial evolutions when considering whole protein sequences. We also propose a procedure for deriving a substitution rate matrix from Single Nucleotide Polymorphisms (SNPs): we first test the statistical compatibility of frequent genetic variants within a species and substitutions accumulated between species; moreover we show that the matrix built from SNPs faithfully describes substitution rates for short evolutionary times, if rate variability is taken into account. Finally, we present a simple model, inspired by coevolution, capable of predicting at the same time the along-chain correlation of substitutions and the time variability of substitution rates. This model is based on the idea that a mutation at a site enhances the probability of fixing mutations in the other protein sites in its spatial proximity, but only for a certain amount of time.
20-ott-2016
Laio, Alessandro
Rizzato, Francesca
File in questo prodotto:
File Dimensione Formato  
1963_35249_Thesis.pdf

Open Access dal 01/04/2017

Tipologia: Tesi
Licenza: Non specificato
Dimensione 8.79 MB
Formato Adobe PDF
8.79 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/4904
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact