Sequence comparison, particularly when combined with the systematic collection, curration, and search of databases containing biomolecular sequences, has become essential in modern molecular biology. Commenting on the (then) near-completion of the effort to sequence the entire yeast genome (now finished), Stephen Oliver says
In a short time it will be hard to realize how we managed without the sequence data. Biology will never be the same again. [478]
One fact explains the importance of molecular sequence data and sequence comparison in biology.
The first fact of biological sequence analysis
The first fact of biological sequence analysis In biomolecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity usually implies significant functional or structural similarity.
Evolution reuses, builds on, duplicates, and modifies “successful” structures (proteins, exons, DNA regulatory sequences, morphological features, enzymatic pathways, etc.). Life is based on a repertoire of structured and interrelated molecular building blocks that are shared and passed around. The same and related molecular structures and mechanisms show up repeatedly in the genome of a single species and across a very wide spectrum of divergent species. “Duplication with modification” [127, 128, 129, 130] is the central paradigm of protein evolution, wherein new proteins and/or new biological functions are fashioned from earlier ones. Doolittle emphasizes this point as follows:
The vast majority of extant proteins are the result of a continuous series of genetic duplications and subsequent modifications.