Knowledge-based identification of functional domains in proteins

Ponzoni, Luca

The characterization of proteins and enzymes is traditionally organised according to the sequence-structure-function paradigm. The investigation of the inter-relationships between these three properties has motivated the development of several experimental and computational techniques, that have made available an unprecedented amount of sequence and structural data. The interest in developing comparative methods for rationalizing such copious information has, of course, grown in parallel. Regarding the structure-function relationship, for instance, the availability of experimentally resolved protein structures and of computer simulations have improved our understanding of the role of proteins' internal dynamics in assisting their functional rearrangements and activity. Several approaches are currently available for elucidating and comparing proteins' internal dynamics. These can capture the relevant collective degrees of freedom that recapitulate the main conformational changes. These collective coordinates have the potential to unveil remote evolutionary relationships between proteins, that are otherwise not easily accessible from purely sequence- or structure-based investigations. Starting from this premise, in the first chapter of this thesis I will present a novel and general computational method that can detect large-scale dynamical correlations in proteins by comparing different representative conformers. This is accomplished by applying dimensionality-reduction techniques to inter-amino acid distance fluctuation matrices. As a result, an optimal quasi-rigid domain decomposition of the protein or macromolecular assembly of interest is identified, and this facilitates the functionally-oriented interpretation of their internal dynamics. Building on this approach, in the second chapter I will discuss its systematic application to a class of membrane proteins of paramount biochemical interest, namely the class A G protein-coupled receptors. The comparative analysis of their internal dynamics, as encoded by the quasi-rigid domains, allowed us to identify recurrent patterns in the large-scale dynamics of these receptors. This, in turn, allowed us to single out a number of key functional sites. These were, for the most part, previously known -- a fact that at the same time validates the method, and gives confidence for the viability of the other, novel sites. Finally, for the last part of the thesis, I focussed on the sequence-structure relationship. In particular, I considered the problem of inferring structural properties of proteins from the analysis of large multiple sequence alignments of homologous sequences. For this purpose, I recasted the strategies developed for the dynamical features extraction in order to identify compact groups of coevolving residues, based only on the knowledge of amino acid variability in aligned primary sequences. Throughout the thesis, many methodological techniques have been taken into considerations, mainly based on concepts from graph theory and statistical data analysis (clustering). All these topics are explained in the methodological sections of each chapter.

Knowledge-based identification of functional domains in proteins / Ponzoni, Luca. - (2016 Oct 19).