Hostname: page-component-586b7cd67f-g8jcs Total loading time: 0 Render date: 2024-11-25T00:44:23.561Z Has data issue: false hasContentIssue false

From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions

Published online by Cambridge University Press:  01 May 1999

BAOHONG ZHANG
Affiliation:
The Scripps Research Institute, La Jolla, California 92037 Present address: Stanford University, Department of Mathematics, Bldg. 380, Stanford, California 94305.
LESZEK RYCHLEWSKI
Affiliation:
The Scripps Research Institute, La Jolla, California 92037 Present address: San Diego Supercomputer Center, UCSD, 9500 Gilman Drive, La Jolla, California 92093.
KRZYSZTOF PAWŁOWSKI
Affiliation:
The Scripps Research Institute, La Jolla, California 92037 Present address: The Burnham Institute, 10901 N. Torrey Pines Rd., La Jolla, California 92037.
JACQUELYN S. FETROW
Affiliation:
The Scripps Research Institute, La Jolla, California 92037
JEFFREY SKOLNICK
Affiliation:
The Scripps Research Institute, La Jolla, California 92037
ADAM GODZIK
Affiliation:
The Scripps Research Institute, La Jolla, California 92037 Present address: The Burnham Institute, 10901 N. Torrey Pines Rd., La Jolla, California 92037.
Get access

Abstract

A database of functional sites for proteins with known structures, SITE, is constructed and used in conjunction with a simple pattern matching program SiteMatch to evaluate possible function conservation in a recently constructed database of fold predictions for Escherichia coli proteins (Rychlewski L et al., 1999, Protein Sci 8:614–624). In this and other prediction databases, fold predictions are based on algorithms that can recognize weak sequence similarities and putatively assign new proteins into already characterized protein families. It is not clear whether such sequence similarities arise from distant homologies or general similarity of physicochemical features along the sequence. Leaving aside the important question of nature of relations within fold superfamilies, it is possible to assess possible function conservation by looking at the pattern of conservation of crucial functional residues. SITE consists of a multilevel function description based on structure annotations and structure analyses. In particular, active site residues, ligand binding residues, and patterns of hydrophobic residues on the protein surface are used to describe different functional features. SiteMatch, a simple pattern matching program, is designed to check the conservation of residues involved in protein activity in alignments generated by any alignment method. Here, this procedure is used to study conservation of functional features in alignments between protein sequences from the E. coli genome and their optimal structural templates. The optimal templates were identified and alignments taken from the database of genomic structural predictions was described in a previous publication (Rychlewski L et al., 1999, Protein Sci 8:614–624). An automated assessment of function conservation is used to analyze the relation between fold and function similarity for a large number of fold predictions. For instance, it is shown that identifying low significance predictions with a high level of functional residue conservations can be used to extend the prediction sensitivity for fold prediction methods. Over 100 new fold/function predictions in this class were obtained in the E. coli genome. At the same time, about 30% of our previous fold predictions are not confirmed as function predictions, further highlighting the problem of function divergence in fold superfamilies.

Type
Research Article
Copyright
1999 The Protein Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)