We describe the results of a procedure for maximizing
the number of sequences that can be reliably linked to
a protein of known three-dimensional structure. Unlike
other methods, which try to increase sensitivity through
the use of fold recognition software, we only use conventional
sequence alignment tools, but apply them in a manner that
significantly increases the number of relationships detected.
We analyzed 11 genomes and found that, depending on the
genome, between 23 and 32% of the ORFs had significant
matches to proteins of known structure. In all cases, the
aligned region consisted of either >100 residues or
>50% of the smaller sequence. Slightly higher percentages
could be attained if smaller motifs were also included.
This is significantly higher than most previously reported
methods, even those that have a fold-recognition component.
We survey the biochemical and structural characteristics
of the most frequently occurring proteins, and discuss
the extent to which alignment methods can realistically
assign function to gene products.