Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-18T20:24:56.266Z Has data issue: false hasContentIssue false

Use of residue pairs in protein sequence–sequence and sequence–structure alignments

Published online by Cambridge University Press:  01 August 2000

JONGSUN JUNG
Affiliation:
Laboratory of Molecular Biology, Division of Basic Sciences, National Cancer Institute, National Institutes of Health, Bldg. 37, Rm. 4B15, 37 Convent Drive MSC 4255, Bethesda, Maryland 20892
BYUNGKOOK LEE
Affiliation:
Laboratory of Molecular Biology, Division of Basic Sciences, National Cancer Institute, National Institutes of Health, Bldg. 37, Rm. 4B15, 37 Convent Drive MSC 4255, Bethesda, Maryland 20892
Get access

Abstract

Two new sets of scoring matrices are introduced: H2 for the protein sequence comparison and T2 for the protein sequence–structure correlation. Each element of H2 or T2 measures the frequency with which a pair of amino acid types in one protein, k-residues apart in the sequence, is aligned with another pair of residues, of given amino acid types (for H2) or in given structural states (for T2), in other structurally homologous proteins. There are four types, corresponding to the k-values of 1 to 4, for both H2 and T2. These matrices were set up using a large number of structurally homologous protein pairs, with little sequence homology between the pair, that were recently generated using the structure comparison program SHEBA.

The two scoring matrices were incorporated into the main body of the sequence alignment program SSEARCH in the FASTA package and tested in a fold recognition setting in which a set of 107 test sequences were aligned to each of a panel of 3,539 domains that represent all known protein structures. Six procedures were tested; the straight Smith-Waterman (SW) and FASTA procedures, which used the Blosum62 single residue type substitution matrix; BLAST and PSI-BLAST procedures, which also used the Blosum62 matrix; PASH, which used Blosum62 and H2 matrices; and PASSC, which used Blosum62, H2, and T2 matrices. All procedures gave similar results when the probe and target sequences had greater than 30% sequence identity. However, when the sequence identity was below 30%, a similar structure could be found for more sequences using PASSC than using any other procedure. PASH and PSI-BLAST gave the next best results.

Type
Research Article
Copyright
© 2000 The Protein Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)