Fast genome sequencing offers invaluable opportunities for building updated and improved models of protein sequence evolution. We here show that Single Nucleotide Polymorphisms (SNPs) can be used to build a model capable of predicting the probability of substitution between amino acids in variants of the same protein in different species. The model is based on a substitution matrix inferred from the frequency of codon interchanges observed in a suitably selected subset of human SNPs, and predicts the substitution probabilities observed in alignments between Homo sapiens and related species at 85–100% of sequence identity better than any other approach we are aware of. The model gradually loses its predictive power at lower sequence identity. Our results suggest that SNPs can be employed, together with multiple sequence alignment data, to model protein sequence evolution. The SNP-based substitution matrix developed in this work can be exploited to better align protein sequences of related organisms, to refine the estimate of the evolutionary distance between protein variants from related species in phylogenetic trees and, in perspective, might become a useful tool for population analysis.

Predicting amino acid substitution probabilities using single nucleotide polymorphisms / Rizzato, Francesca; Rodriguez, Alex; Biarnés, Xevi; Laio, Alessandro. - In: GENETICS. - ISSN 0016-6731. - 207:2(2017), pp. 643-652. [10.1534/genetics.117.300078]

Predicting amino acid substitution probabilities using single nucleotide polymorphisms

Rizzato, Francesca;Rodriguez, Alex;Laio, Alessandro
2017-01-01

Abstract

Fast genome sequencing offers invaluable opportunities for building updated and improved models of protein sequence evolution. We here show that Single Nucleotide Polymorphisms (SNPs) can be used to build a model capable of predicting the probability of substitution between amino acids in variants of the same protein in different species. The model is based on a substitution matrix inferred from the frequency of codon interchanges observed in a suitably selected subset of human SNPs, and predicts the substitution probabilities observed in alignments between Homo sapiens and related species at 85–100% of sequence identity better than any other approach we are aware of. The model gradually loses its predictive power at lower sequence identity. Our results suggest that SNPs can be employed, together with multiple sequence alignment data, to model protein sequence evolution. The SNP-based substitution matrix developed in this work can be exploited to better align protein sequences of related organisms, to refine the estimate of the evolutionary distance between protein variants from related species in phylogenetic trees and, in perspective, might become a useful tool for population analysis.
2017
207
2
643
652
http://www.genetics.org/content/genetics/207/2/643.full.pdf
Rizzato, Francesca; Rodriguez, Alex; Biarnés, Xevi; Laio, Alessandro
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/67636
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact