First Year Report

From SungsamGong

Jump to: navigation, search

Contents

Title

Functional Residues and Environment Specific Substitution Table

Summary

Identification of residues responsible for a specific function of a protein can provide clues about the mechanism of a protein’s function. Computational approaches to identifying functional residues have emerged as low cost alternatives to experimental methods, so providing fast and large-scale analyses. Moreover, demand for such approaches is increasing as more sequences become available as a result of genome sequencing projects. In this report, computational approaches are briefly reviewed. Especially, I focus on the use of Crescendo developed in the Blundell group to identify functional residues in proteins of known structure.

Crescendo is based on the use of Environment Specific Substitution Matrices or ESSTs which define the way that accepted amino acid substitutions are influenced by the local structural environment. I describe how ESSTs can be enhanced by using only amino acids that are not involved in catalytic activity, metal or ligand binding, post-translational modification or protein interactions. I describe the benchmark results using Crescendo for proteins where functions have been defined experimentally. The new ESSTs increase the z-score by 13% at the active site residues compared with the old.

In the process of updating ESSTs, functional residues are identified and stored in FaceView, a database and web server which visualizes functional residues in two or three-dimensional fashion. The use of FaceView is highlighted and its future development is discussed.

Background and Rationale

Identification of Functional Residues

Proteins can be described as the basic action units of function in that most information in a cell is transmitted and finally embodied in proteins. This observation underlines the importance of identification of protein functions but, since the genome sequencing projects provide only sequences of the complete proteome, the functions may still be unknown.

Not all amino acids of a protein contribute to the same degree to function. It is very important to identify functional residues which may be involved in the catalytic function of an enzyme, recognition of DNA, or interactions with other proteins or ligand molecules. By genetically modifying sequences responsible for a function, plants can be engineered to be resistant to droughts and insects. Functional residues can be a target for gene therapy and the design of better drugs. Also, the mechanism of a disease can be unveiled by identifying functional residues of disease causing proteins.

There have been many experimental approaches designed to unveil the role of a protein and identify amino acids that endow a protein with a specific function. The function of a protein can be identified by disabling or knock-out of the target protein of interest (Chang et al., 2006; Kile and Hilton, 2005). In parallel, the function of an amino acid is typically identified by chemically modifying or mutating the target amino acid using site-directed mutagenesis (Foley and Burkart, 2007; Weiss et al., 2000; Williams et al., 1997). Structural biologists try to understand the relationship between structure and function of a protein and unveil the functional residues by solving three dimensional structures of proteins using X-ray crystallography and NMR techniques (Broadhurst et al., 2003; Callaghan et al., 2004; Pellegrini et al., 2000). The experimental results are compiled in various databases such as PROSITE (Hulo et al., 2006), SWISS-PROT (Apweiler et al., 2004), CSA (Porter et al., 2004) and PDB (Berman et al., 2000). Brief descriptions of these databases are as follows:

1) PROSITE is a database that contains distinct patterns of sequence motif.

2) SWISS-PROT is a major depository of a protein sequence annotated with submitter’s experimental results.

3) CSA, Catalytic Site Atlas, contains active sites of enzymes from the literature information.

4) PDB, Protein Data Bank, is a major depository of protein three dimensional structures.

Although there have been great endeavours to identify functional residues of proteins, the reality is that more sequences are becoming available as a result of genome sequencing projects and the numbers of sequences of proteins with unknown functions continues to grow. To fill this gap, several computational approaches have been introduced. Computational alanine scanning mutagenesis can detect possible hot-spots of a protein in a fast and low-cost manner (Kortemme et al., 2004; Massova and Kollman, 1999). Functional residues can be revealed by network analysis of protein structures (Amitai et al., 2004). By combining evolutionary information of a sequence and three-dimensional structures, active sites and ligand binding sites of various proteins can be successfully predicted (Armon et al., 2001; Chelliah et al., 2004; Lichtarge et al., 1996). Lichtarge et al. first introduced the method named Evolutionary Trace or ET in 1996. They were able to apply their method to SH2, SH3, and DNA binding domains and successfully identified functional residues. Innis et al. (Innis et al., 2000) further applied the ET method on TGF-beta family and developed an on-line ET server, named TraceSuitII, which is available from http://www-cryst.bioc.cam.ac.uk/~jiye/evoltrace/evoltrace.html. Consurf (Glaser et al., 2003) is another web server implementing ET method but uses its own scoring scheme. Recently, Crescendo (Chelliah et al., 2004), developed by the Blundell group, has been successful in prediction of functional residues because it distinguishes functional constraints from structural constrains, both of which are responsible for conservation of amino acids. Structural constraints are further divided and represented by Environment Specific Substitution Table, or ESST as described below.

Environment Specific Substitution Table (ESST)

Proteins existing in living organisms can be described as the fittest in the sense that they have been selected through the process of evolution. They are even evolving now by spontaneous mutations under the selection pressure and fitness for their environments. The rate of a mutation or substitution is different for the 20 amino acids in a protein. The different substitution rates for the 20 amino acids were first quantified by Margaret Dayhoff as the PAM (Percentile Accepted Mutation) matrix in 1970s (Dayhoff and Eck, 1968). The methodology was further developed by Henikoff et al. (Henikoff and Henikoff, 1992) to reflect more divergent relationships of protein sequences which approaches using PAM failed to detect. The BLOSUM62 is now recognized as a de facto standard measure of substitution rate for the 20 amino acids. The degree of conservation, or conversely the rate of substitution, of amino acids is under many evolutionary constraints. One of those constraints is dependent on the need to retain the protein tertiary structure and local structural environments of the amino acid.

The Environment Specific Substitution Table is a substitution table that considers structural constraints in the calculation of substitution rate. Overington et al. (Overington et al., 1992; Overington et al., 1990) first introduced Environment Specific Substitution Tables or ESSTs from a set of homologous protein families whose three-dimensional structures were available. The rationale behind ESST is that the acceptance of substitution of an amino acid is subject to its local tertiary environment. The structural environments of amino acids include 1) main-chain conformation and secondary structure, 2) solvent accessibility, and 3) hydrogen bonding between side-chain and main-chain. 64 ESSTs can be derived from a combination of structural features; 4 from secondary structures, 2 from solvent accessibility, and 8 from hydrogen bonds. These combinations of structural features restrict possible substitutions of an amino acid and make distinct patterns of substitution tables. Compared with traditional substitution tables (PAM, BLOSUM) derived from only sequence information, ESST was shown to give more precise and discriminating measures of substitution probabilities (Wako and Blundell, 1994).

ESSTs were improved and updated by Shi et al. when FUGUE was first introduced in 2001 (Shi et al., 2001). They enhanced the ESSTs by introducing following features: 1) a clustering scheme to correct sampling bias, 2) a smoothing procedure to correct data sparsity, 3) using only high resolution structures in the alignments as a source of substitution matrices, and 4) reduction of the bias caused by non-structural constraints. The last feature is designed to separate functional constraints from structural constraints when generating ESSTs. Because ESSTs only take into account structural environments, substitutions where the amino acids are conserved by functional reasons were not counted in the calculation of matrices. Shi et al. took two kinds of functional residues into account to eliminate non-structural constraints which may cause a bias in the ESST. They were residues involved in domain-domain interactions and those interacting with ligand. Those residues were masked in the alignment files and were not taken into account in the substation counts. ESSTs have been shown to be useful in applications to secondary structure predictions (Wako and Blundell, 1994) and fold recognition (Johnson et al., 1993; Rice and Eisenberg, 1997). Also, ESSTs were used to find functional residues in Crescendo (Chelliah et al., 2004).


The Scope of This Report

In this report, I describe two topics. The first concerns enhancing ESSTs which were last updated in 2001. The method and procedure of updating ESST are explained. To assess the new ESSTs, they were benchmarked against Shi et al.’s ESST (Shi et al., 2001). The z-score of Crescendo was used as a measure for the benchmark and the score was measured in two regions: 1) functional residues and 2) interacting residues. The second topic is the database and server, called FaceView, of functional residues which are used as a source for updating ESST. The database schema and contents of are described. Finally I discuss further enhancements of FaceView and future applications of ESSTs as tools for measuring evolutionary rate and relative ages of proteins.

Material and Method

Updating ESST

Upating Strategy

The updating strategy is to exclude as many functional residues as possible from the substitution counts when calculating new ESSTs. Because those residues may be conserved in the alignments for functional reasons, they should be masked to reduce the bias caused by non-structural constraints. In addition, the updating process was designed to use a relational database to dynamically generate masked alignments by using different combinations of functional residues for masking purposes.

Alignment Source and Substitution Calcuation

The new ESST was rebuilt based on the 177 or 371 structure alignments from which the old ESST had been derived. The original alignment files were downloaded from http://www-cryst.bioc.cam.ac.uk/~kenji/subst. Those structural alignments came from a subset of HOMSTRAD, which is a curated database of structure-based alignments for homologous protein families developed by the Blundell group (Mizuguchi et al., 1998; Stebbings and Mizuguchi, 2004). The first subset (SUB177) consists of 177 families comprising a total of 706 structures and the second subset (SUB371) consists of 371 families comprising 1357 structures. Each subset contains structures which are no worse than 2.5 Ǻ resolution. The full requirements of the subset are explained in Shi’s paper (Shi et al., 2001). The program SUBST, which is written by Dr Kenji Mizuguchi (unpublished results), was used in the calculation of substitution table.


Masking Resources

To maximize masking counts, the following three databases are used as sources of functional residues.

UniProt (http://www.pir.uniprot.org/): UniProt is a manually curated depository of protein sequences (Apweiler et al., 2004). Each sequence entry is annotated with useful information such as descriptions of function, structural or functional domains, secondary structures, post-transcriptional modifications, and variants. These annotations are provided in flat files composed of distinct identifiers with different types. Among them, the FT lines describe regions or sites of interest in the sequence. Table 3-1 lists feature identifiers that are regarded as functional residues and used as masking sources. UniProt knowledgebase release 11.0 was used.

Table 3 1. Lists of feature identifiers considered to be functional residues in UniProt. The number in the second column is based on the UniProt knowledge base realse 11.0

Table 3-1. Lists of feature identifiers considered to be functional residues in UniProt. <br />The number in the second column is based on the UniProt knowledge base realse 11.0<br />

CSA (http://www.ebi.ac.uk/thornton-srv/databases/CSA/): The Catalytic Site Atlas (CSA) is a database of enzyme active sites and catalytic residues of enzymes whose 3D structure are available (Porter et al., 2004). The CSA provides two types of entries: 1) Original hand-annotated entries, derived from the primary literature, and 2) Homologous entries, found by PSI-BLAST alignment to one of the original entries. Table 3-2 shows the number of PDB and its residue in CSA of version 2.2.4 which was used in the analysis at the moment of this writing.

Table 3-2 Number of entries in CSA

Table 3-2 Number of entries in CSA<br />

InterPare (http://interpare.net): Residues involved in domain-domain interactions that are not defined in the definition of the local environment should be masked when we count substitutions from homologous alignments. Domain interaction data were adopted from InterPare (Gong et al., 2005) which uses SCOP (Murzin et al., 1995) as a domain definition. At the time of writing, InterPare contains 64985 interacting domain pairs where interaction is defined if there are at least five pairs of residues which fall within 5 Ǻ distance between two domains.

Benchmarking New ESSTs

Benchmarking Program

Crescendo (Chelliah et al., 2004) was used to benchmark new ESSTs. It is a program that predicts functional site residues using the ESSTs which represent only structural constraints. The rationale behind Crescendo is to distinguish functional constraints from structural constraints each of which gives rise to the conservation of amino acid in the evolutionary process. The degree of conservation of an amino acid is quantified into 1) an observed probability based on the alignment to which a queried protein sequence belongs and 2) an expected probability calculated by using ESST. The overall difference between two probabilities is converted into a z-score which can identify extra restraints - probably functional - on evolution which are not described by the ESSTs. Crescendo can be a good benchmarking tool, evaluating new ESSTs where more functional residues are masked than in the derivation of the old ESST. In addition, we can identify relative contributions of masking sources (UniProt, CSA, and InterPare) on the performance of ESSTs by using different matrices generated by combinations of masking sources.

Benchmarking Sets

Two benchmark sets were prepared and tested; CSA_LIT and CSA_ALL. Basically, proteins in both sets are from HOMSTRAD. Active site residues of both sets are from CSA and interacting residues are from InterPare (see 3.1.3). As for active sites, CSA_LIT only contains catalytic sites where CSA refers to the literature information, but CSA_ALL contains the predicted sites by homology using PSB-BLAST from the ones in the literatures. To remove bias of masking effects on benchmark data, the structures included in the masking process were not included in the benchmark sets. As shown in table 3-3, 135 active site residues and 3336 interacting residues are included in CSA_LIT and 473 active sites and 11264 interacting residues are included in CSA_ALL.

Table 3-3 Benchmark sets to assess new ESSTS

Table 3-3 Benchmark sets to assess new ESSTS<br />1: Active sites of CSA_LIT set are based on the literature information from Catalytic Site Atlas. <br />CSA_ALL is based on not only literature entries but also from homologous entries by BLAST search<br />2: Interacting residues are defined as those involved in the domain-domain interactions by checking the geometric distance <br />between two adjacent domains<br />

Database and Web Site for Functional Residues

MySQL was used to store functional residues from UniProt, CSA, and InterPare. Two alignment sets, SUB177 and SUB371, were parsed and stored in relational format by using Bio-Perl (http://www.bioperl.org) and Perl DBI (http://dbi.perl.org/). Sequences from UniProt and its corresponding entry in PDB were mapped by parsing XML files provided by MSD SIFTS (http://www.ebi.ac.uk/msd-srv/docs/sifts/). Web front-end, the FaceView, was constructed based on the XAMPP (http://www.apachefriends.org/en/xampp.html) platform. To visualize functional residues, Jmol (http://www.jmol.org) and Gbrowse (http://www.gmod.org/wiki/index.php/GBrowse) have been integrated into the web site for 3D and 2D representation, respectively. Figure 3-1 shows the database schema of generating new ESSTs. It serves also as a backbone for FaceView.

Figure 3-1 Database schema of the masking process

Figure 3-1 Database schema of the masking process. <br />FuncRes table contains functional residues from UniProt, CSA, and InterPare. <br />ResidueMap is a mapping table between sequences in UniProt and PDB. SUB177 and SUB371 are sources for structural <br />alignments to calculate substitution matrix and they are linked to MaskRes table which tells the position of functional residues <br />in the alignments. This figure is drawn by DBDesigner (http://fabforce.net/dbdesigner4/).<br />

Result and Discussion

New ESSTs

The new ESSTs were rebuilt based on the 177 or 371 structure alignments on which the old ESSTs had been derived, but more functional residues identified from different resources were masked (see 3.1.3). Table 4-1 shows the total number of masked residues in new ESSTs compared with the old one (SUB177-OLD). 2048 residues were masked in the old ESST and they are residues involved in domain-domain or protein-ligand interactions. Three types of new ESSTs were built based on different combinations of masking resources; UniProt, CSA and InterPare for type A, UniProt and CSA for type B, UniProt alone for type C. Type A masks nine times more functional residues than old ESST and type C, which uses only UniProt as a source of functional residues, masks almost two times more. The number of masked residues by the different types of sources is described in table 4-2. Figure 4-1 shows the number of masked residues from unions and intersections of UniProt, CSA and InterPare in a Venn diagram fashion. InterPare contains around 4 times and 20 times more masking-residues than UniProt and CSA, respectively.


Table 4-1 The number of masked residues in new ESST

Table 4-1 The number of masked residues in new ESST. <br />Three types of new ESST were generated based on different sources of functional residues. <br />(A=UniProt + CSA + InterPare, B=UniProt + CSA, C=UniProt, X=no-masking, OLD=old ESST by Shi et al.)<br />

All types of new ESSTs are believed to better reflect local structural environments because non-structural constraints, which are also responsible for conservation at some residues in the alignment, were removed by masking functional residues. Different types of ESSTs are made to identify relative contributions of masking effects on the performance of ESSTs. For example, in figure 4-1, 13648 residues for SUB117 and 37496 for SUB371 which are from InterPare, are not included in matrix B but masked in matrix A. Hence, the discrepancy between two results, one from matrix A and the other from B, could reveal the relative effects of masking residues involved in domain-domain interactions on finding functional residues by Crescendo. These are explained and discussed in the following sections.

Table 4 2 Number of masked residues by UniProt, CSA, and InterPare. The different types of sources are described in the section 3.1.3
Table 4 2 Number of masked residues by UniProt, CSA, and InterPare. The different types of sources are described in the section 3.1.3
Table 4 2 Number of masked residues by UniProt, CSA, and InterPare. The different types of sources are described in the section 3.1.3<br />

Figure 4-1 The number of masked residues from different resources are represented in a Venn diagram. Blue coloured numbers are for SUB177 and red colour for SUB371

Figure 4-1 The number of masked residues from different resources are represented in a Venn diagram. <br />Blue coloured numbers are for SUB177 and red colour for SUB371<br />

Benchmark Results of New ESSTs

Six new ESSTs (see table 4-1) were benchmarked against the old ESST (SUB177-OLD) and two non-masked ESSTs (SUB177-X, SUB371-X). The average z-scores of Crescendo for both active sites and interacting residues were compared according to different ESSTs. Benchmarking sets are described in 3.2.2. All the values of z-score by difference ESSTs can be found in the appendix.

Effects of new ESSTs on Active Site Residues

First of all, the average z-score for all the residues in the benchmark sets were compared before checking the active sites. This is to measure the background z-score for all the residues and rank the score according to nine ESSTs. Figure 4-2 shows the average z-score for all the residues in two benchmark sets; CSA_LIT, and CSA_ALL (see 3.3.3 for benchmark sets). As seen in figure 4-2, residue masking (new and old ESST) makes very little difference from not masking (SUB177-X, SUB371-X) in terms of the difference in z-score. Except SUB371-B and SUB371-C, new ESSTs do not increase average z-score compared with the old ESSTs. Hence, for all the residues, there is no significant correlation between the numbers of masked residues and the value of z-scores in the benchmark sets.

Figure 4-2 Average z-score for all residues. The actual values are in the appendix.

Figure 4-2 Average z-score for all residues. The actual values are in the appendix. <br />Histogram A is for CSA_LIT set and B is for CSA_ALL. See table 3-3 for benchmark sets.<br />

Figure 4-3 shows the average z-score for the active site residues in CSA_LIT and CSA_ALL. The histograms clearly show that all the new ESSTs increase z-score at the active site residues compared with old and non-masking ESSTs. As shown in figure A of 4-3 which is for CSA_LIT set, SUB371-B increases z-score by 49% more than the non-masked ESST (SUB177-X) and 16% more than the old ESST (SUB177-OLD). For CSA_ALL, SUB371-A shows 32% and 13 % increase compared with non-masking ESST and old ESST, respectively. Putting together with figure 4-2, it is clear that the increase of z-score does not take place for all the residues but only for the active site residues.

Figure 4-3 also reveals the relative contribution of masking effects on the sensitivity of identifying active site residues. For both data sets, B type matrix has bigger z-scores than type C. Because type C does not mask catalytic sites from CSA (see figure 4-1 and table 4-1), it can be reasoned that 656 and 1341 residues, which are uniquely masked by type B (SUB177-B and SUB371-B respectively), are responsible for increasing z-score at the active sites. Although type C masks the smallest number of functional residues among three types of ESST, it outperforms the old matrix (SUB177-X) and obviously two non-masking matrices (SUB177-X and SUB371-X). However, there are no significant differences in z-score between type B and A where the difference lies in masking interacting residues.

 Figure 4-3 The average z-score for active site residues.

Figure 4-3 The average z-score for active site residues. <br />Active sites are from CSA (see table 3-3 for details). <br />Histogram A is for CSA_LIT and B is for CSA_ALL. a) and b) represent the extent of increase in z-score by SUB371-B <br />compared with SUB117-X, and SUB177-OLD, respectively. c) and d) show increase in z-score by SUB371-A <br />over SUB177-X and SUB177-OLD, respectively.<br />

Effects of New ESSTs on Interacting Residues

Figure 4-4 shows the average z-score on the interacting residues in CSA_LIT, and CSA_ALL. Interacting residues are from InterPare (see 3.2.2 for details). In the case of CSA_LIT, SUB371-A increases the z-score by 9.4% and 5.4% compared with the non-masked ESST (SUB177-X) and the old ESST (SUB177-OLD), respectively. For CSA_ALL, SUB371-A shows 6.6% and 3.1% increase in z-score compared with non-masking ESST and old ESST, respectively. In terms of relative contribution of masking effects on the sensitivity of identifying interacting residues, type A, except SUB371-B in figure A of 4-4, shows slightly higher z-score than type B. Thus it can be reasoned that 13648 and 39496 interacting residues (see figure 4-1) are responsible for increasing z-score because they are masked only in type A but not included in type B.

From the results of 4.2.1 and 4.2.2, it can be concluded that the masking effects of functional residues can be best shown when the ESSTs were used to find specific functional residues where they are masked in the ESSTs.

Figure 4-4 Average z-score for interacting residues

Figure 4-4 Average z-score for interacting residues. Interacting residues are from InterPare. <br />Figure A is for CSA_LIT, and B is for CSA_ALL (see table 3-3 for benchmark sets). <br />a) and b) represent the extent of increase in z-score by SUB371-A compared with SUB117-X, and SUB177-OLD, respectively.<br />c) and d) show the extent of increase in z-score over SUB177-X and SUB177-OLD, respectively.<br />

FaceView: a Web Server for Visualizing Functional Residues

To visualize functional residues which are masked in substitution counts, FaceView (http://malory.bioc.cam.ac.uk/FaceView) has been implemented. FaceView is a web server designed to present important amino acids, in terms of a protein’s function and structure, both in 2 and 3-dimensional fashion. At the time of writing, FaceView can provide 1) domain information from SCOP and Pfam, 2) interacting interfaces between adjacent SCOP domains, 3) residues responsible for Single Amino Acid Polymorphisms (SAPs), and 4) mutated or polymorphic residues 5) functional residues annotated in UNIPROT and CSA (as shown in table 3-1 and 3-2). FaceView is unique in that polymorphic and mutated residues can be shown in conjunction with amino acids annotated as important residues in the function and structural stability of a protein. In addition, genetic variations, which are responsible for the polymorphisms of amino acid, can be interpreted and navigated in the structural and functional context of a protein.

Figure 4-5 shows several screen-shots from the FaceView web interface. Two searching interfaces are provided. The first one is for PDB id, and the second is for keyword search. The keyword search-engine returns full descriptions of proteins matched by a queried keyword with several external IDs from various biological databases. They are 1) HUGO gene (Povey et al., 2001), 2) UniProt (Apweiler et al., 2004), 3) Ensembl (Hubbard et al., 2007), 4) PDB (Berman et al., 2000), 5) Entrez (Maglott et al., 2007), 6) PubMed (http://www.pubmed.gov), 7) Refseq (Pruitt et al., 2007), and 8) OMIM (McKusick, 2007).

 Figure 4-5 Screenshots of FaceView web interface.

Figure 4-5 Screenshots of FaceView web interface.  <br />a) Main page of FaceView highlighting two searching interfaces. <br />The red box is a query interface for a PDB id and the blue box is for a keyword search. <br />b) Search result by ‘kinase’ keyword. c) Search result of ‘1apz’ as PDB id. Interacting interface regions of two adjacent <br />domains of ‘1apz’ are represented in space-fill model. Two active sites from UniProt are coloured in CPK. 3D molecular <br />viewer is implemented by Jmol. d) 2D representation for ‘1apz’ a PDB id. <br />Blue line and yellow line represent regions for Pfam and SCOP domain, respectively. <br />SNPs and SwissProt variants are also annotated. IDs for Pfam, SCOP, SNP, and SwissProt variants in the figure are externally <br />liked to original web site.<br />

Future Direction

Further Benchmark of New ESSTs

New ESSTs were benchmarked on Crescendo to check the improvement in identifying functional site residues. However, ESST was first developed for fold or homology recognition by means of sequence-structure comparisons. Hence, new ESSTs need to be benchmarked to check any improvement in terms of sequence-structure alignments. FUGUE (Shi et al., 2001), which was developed by Blundell’s group, is a program that performs an alignment between a queried sequence and structure templates. Shi et al. observed that there was no significant improvement in the alignment accuracy between non-masking and masking ESSTs and thought this to be a consequence of the small number of masked residues in substitution count (2.4% of total substation counts). The alignment accuracy is expected to be increased by using the updated ESSTs because they masks nine times more functional residues than the old ESSTs, in the case of SUB177-A (see table 4-1).

Other Applications of ESST; a Proxy for Evolution Rate

It is known that evolution rates vary with proteins over several orders of magnitude. Many factors influence the rate of protein evolution to different degrees. Different explanations have been advanced for this. Kimura and Ohta proposed that functionally important proteins evolve much more slowly (Kimura, 1968; Ohta, 1973). In the same context, highly dispensable genes show more tolerance to deleterious mutations than less dispensable genes (Batada et al., 2006; Pal et al., 2003; Wall et al., 2005). It has also been proposed that the proportion of functional residues is negatively related to the rate of evolution, which is known as functional density hypothesis (Zuckerkandl, 1976a; Zuckerkandl, 1976b). Fraser et al. (Fraser et al., 2002) proposed that proteins with many interaction partners evolve somewhat slower but there are controversies over the correlation with the hub proteins and the evolution rate (Bloom and Adami, 2003; Jordan et al., 2003). Recently, Drummond et al. supposed that the expression level is a single dominant factor that explains evolutionary rate of proteins (Drummond et al., 2005; Drummond et al., 2006) and Choi et al. addressed the relation between transcriptional properties and essentiality and evolutionary rate of proteins (Choi et al., 2007). The rationale behind the relationship between expression level and evolution rate is that highly expressed proteins evolve much slower because they are required to have fewer mutations than relatively rare proteins to avoid high cost of misfolding effects.

The rate of protein evolution can be resolved into the rate of a mutation of each amino acid in a protein. Conventional substitution matrices such as PAM (Dayhoff and Eck, 1968) and BLOSUM (Henikoff and Henikoff, 1992) are designed to measure the rate of mutation based on the homologous sequences of families. Additionally, ESSTs can reflect the degree of conservation in the context of local structural factors. Hence, I plan to devise a new method that can reflect evolution rate of proteins based on ESST. Also, the relation between a protein’s evolution rate and various structural factors will be addressed.


Future Development of FaceView

FaceView now provides functional residues of proteins in 2D and 3D fashion. In the near future, FaceView will include the following information on the web site; 1) surface and interior residues, 2) ligand-binding sites, 3) nucleic-acid binding sites, 4) functional residues predicted by Crescendo, and 5) hot-spot residues predicted by programs such as SDM (Topham et al., 1997), FoldX (Schymkowitz et al., 2005), and PoPMusic (Gilis and Rooman, 2000). Also, new ESSTs will be downloadable from FaceView if there is significant improvement in the benchmark of FUGUE.


References

Amitai, G., Shemesh, A., Sitbon, E., Shklar, M., Netanely, D., Venger, I., and Pietrokovski, S. (2004). Network analysis of protein structures identifies functional residues. J Mol Biol 344, 1135-1146.

Apweiler, R., Bairoch, A., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et al. (2004). UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 32, D115-119.

Armon, A., Graur, D., and Ben-Tal, N. (2001). ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol 307, 447-463.

Batada, N. N., Hurst, L. D., and Tyers, M. (2006). Evolutionary and physiological importance of hub proteins. PLoS Comput Biol 2, e88.

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., and Bourne, P. E. (2000). The Protein Data Bank. Nucleic Acids Res 28, 235-242.

Bloom, J. D., and Adami, C. (2003). Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein-protein interactions data sets. BMC Evol Biol 3, 21.

Broadhurst, R. W., Nietlispach, D., Wheatcroft, M. P., Leadlay, P. F., and Weissman, K. J. (2003). The structure of docking domains in modular polyketide synthases. Chem Biol 10, 723-731.

Callaghan, A. J., Aurikko, J. P., Ilag, L. L., Gunter Grossmann, J., Chandran, V., Kuhnel, K., Poljak, L., Carpousis, A. J., Robinson, C. V., Symmons, M. F., and Luisi, B. F. (2004). Studies of the RNA degradosome-organizing domain of the Escherichia coli ribonuclease RNase E. J Mol Biol 340, 965-979.

Chang, K., Elledge, S. J., and Hannon, G. J. (2006). Lessons from Nature: microRNA-based shRNA libraries. Nat Methods 3, 707-714. Chelliah, V., Chen, L., Blundell, T. L., and Lovell, S. C. (2004). Distinguishing structural and functional restraints in evolution in order to identify interaction sites. J Mol Biol 342, 1487-1504.

Choi, J. K., Kim, S. C., Seo, J., Kim, S., and Bhak, J. (2007). Impact of transcriptional properties on essentiality and evolutionary rate. Genetics 175, 199-206.

Dayhoff, M. O., and Eck, R. V. (1968). Atlas of Protein Sequence and Structure. Natl Biomed Res Found 3, 33.

Drummond, D. A., Bloom, J. D., Adami, C., Wilke, C. O., and Arnold, F. H. (2005). Why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A 102, 14338-14343.

Drummond, D. A., Raval, A., and Wilke, C. O. (2006). A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol 23, 327-337.

Foley, T. L., and Burkart, M. D. (2007). Site-specific protein modification: advances and applications. Curr Opin Chem Biol 11, 12-19.

Fraser, H. B., Hirsh, A. E., Steinmetz, L. M., Scharfe, C., and Feldman, M. W. (2002). Evolutionary rate in the protein interaction network. Science 296, 750-752.

Gilis, D., and Rooman, M. (2000). PoPMuSiC, an algorithm for predicting protein mutant stability changes: application to prion proteins. Protein Eng 13, 849-856.

Glaser, F., Pupko, T., Paz, I., Bell, R. E., Bechor-Shental, D., Martz, E., and Ben-Tal, N. (2003). ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 19, 163-164.

Gong, S., Park, C., Choi, H., Ko, J., Jang, I., Lee, J., Bolser, D. M., Oh, D., Kim, D. S., and Bhak, J. (2005). A protein domain interaction interface database: InterPare. BMC Bioinformatics 6, 207.

Henikoff, S., and Henikoff, J. G. (1992). Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89, 10915-10919.

Hubbard, T. J., Aken, B. L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cunningham, F., Cutts, T., et al. (2007). Ensembl 2007. Nucleic Acids Res 35, D610-617.

Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P. S., Pagni, M., and Sigrist, C. J. (2006). The PROSITE database. Nucleic Acids Res 34, D227-230.

Innis, C. A., Shi, J., and Blundell, T. L. (2000). Evolutionary trace analysis of TGF-beta and related growth factors: implications for site-directed mutagenesis. Protein Eng 13, 839-847.

Johnson, M. S., Overington, J. P., and Blundell, T. L. (1993). Alignment and searching for common protein folds using a data bank of structural templates. J Mol Biol 231, 735-752.

Jordan, I. K., Wolf, Y. I., and Koonin, E. V. (2003). No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol 3, 1.

Kile, B. T., and Hilton, D. J. (2005). The art and design of genetic screens: mouse. Nat Rev Genet 6, 557-567.

Kimura, M. (1968). Evolutionary rate at the molecular level. Nature 217, 624-626.

Kortemme, T., Kim, D. E., and Baker, D. (2004). Computational alanine scanning of protein-protein interfaces. Sci STKE 2004, pl2. Lichtarge, O., Bourne, H. R., and Cohen, F. E. (1996). An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257, 342-358.

Maglott, D., Ostell, J., Pruitt, K. D., and Tatusova, T. (2007). Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 35, D26-31.

Massova, I., and Kollman, P. (1999). Computational Alanine Scanning To Probe Protein-Protein Interactions: A Novel Approach To Evaluate Binding Free Energies. Jouurnal of the American Chemical Society 121, 8133-8143.

McKusick, V. A. (2007). Mendelian Inheritance in Man and its online version, OMIM. Am J Hum Genet 80, 588-604.

Mizuguchi, K., Deane, C. M., Blundell, T. L., and Overington, J. P. (1998). HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci 7, 2469-2471.

Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247, 536-540.

Ohta, T. (1973). Slightly deleterious mutant substitutions in evolution. Nature 246, 96-98.

Overington, J., Donnelly, D., Johnson, M. S., Sali, A., and Blundell, T. L. (1992). Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds. Protein Sci 1, 216-226.

Overington, J., Johnson, M. S., Sali, A., and Blundell, T. L. (1990). Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction. Proc Biol Sci 241, 132-145.

Pal, C., Papp, B., and Hurst, L. D. (2003). Genomic function: Rate of evolution and gene dispensability. Nature 421, 496-497; discussion 497-498.

Pellegrini, L., Burke, D. F., von Delft, F., Mulloy, B., and Blundell, T. L. (2000). Crystal structure of fibroblast growth factor receptor ectodomain bound to ligand and heparin. Nature 407, 1029-1034.

Porter, C. T., Bartlett, G. J., and Thornton, J. M. (2004). The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32, D129-133.

Povey, S., Lovering, R., Bruford, E., Wright, M., Lush, M., and Wain, H. (2001). The HUGO Gene Nomenclature Committee (HGNC). Hum Genet 109, 678-680.

Pruitt, K. D., Tatusova, T., and Maglott, D. R. (2007). NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35, D61-65.

Rice, D. W., and Eisenberg, D. (1997). A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. J Mol Biol 267, 1026-1038.

Schymkowitz, J., Borg, J., Stricher, F., Nys, R., Rousseau, F., and Serrano, L. (2005). The FoldX web server: an online force field. Nucleic Acids Res 33, W382-388.

Shi, J., Blundell, T. L., and Mizuguchi, K. (2001). FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 310, 243-257.

Stebbings, L. A., and Mizuguchi, K. (2004). HOMSTRAD: recent developments of the Homologous Protein Structure Alignment Database. Nucleic Acids Res 32, D203-207.

Topham, C. M., Srinivasan, N., and Blundell, T. L. (1997). Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. Protein Eng 10, 7-21.

Wako, H., and Blundell, T. L. (1994). Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. II. Secondary structures. J Mol Biol 238, 693-708.

Wall, D. P., Hirsh, A. E., Fraser, H. B., Kumm, J., Giaever, G., Eisen, M. B., and Feldman, M. W. (2005). Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci U S A 102, 5483-5488.

Weiss, G. A., Watanabe, C. K., Zhong, A., Goddard, A., and Sidhu, S. S. (2000). Rapid mapping of protein functional epitopes by combinatorial alanine scanning. Proc Natl Acad Sci U S A 97, 8950-8954.

Williams, M. G., Wilsher, J., Nugent, P., Mills, A., Dhanaraj, V., Fabry, M., Sedlacek, J., Uusitalo, J. M., Penttila, M. E., Pitts, J. E., and Blundell, T. L. (1997). Mutagenesis, biochemical characterization and X-ray structural analysis of point mutants of bovine chymosin. Protein Eng 10, 991-997.

Zuckerkandl, E. (1976a). Evolutionary processes and evolutionary noise at the molecular level. I. Functional density in proteins. J Mol Evol 7, 167-183.

Zuckerkandl, E. (1976b). Evolutionary processes and evolutionary noise at the molecular level. II. A selectionist model for random fixations in proteins. J Mol Evol 7, 269-311.


 


Personal tools