ABSTRACT
This paper describes the BIOSTATION, a generalized document preparation system, developed to guide an interactive editing of biological sequences by taking into account their semantics. This paper also focusses on the use of a document preparation system as the mediator for a larger application.
Introduction
The BIOSTATION is a generalized document preparation system, developed for the CRBM** and in use since May 85, able to guide an interactive editing of biological sequences by taking into account their semantics. This semantic is extracted at editing time from the document itself by an integrated expert system, and is used to express the structuration. This paper also focusses on the use of a document preparation system as the mediator for a larger application.
Genetic sequences are observed in this approach as generalized documents. This choice allows to associate convenient, and so more legible, visual representations to the abstract aspects of biological sequences semantic.
At first, we explain how semantic information on the sequences is obtained and used to guide editing. The biostation architecture is presented in a second section.
Problem position
The genetic information which allows organic cells to synthesize proteins is kept in genes. These genes are linear strings built with four types of molecules (Adenin, Thymin, Guanin, Cytosin) called nucleotids. The non biologist readers can refer to [Hélène 84]. The studied length of such strings can be up to 30000 atoms.
A gene can be analysed by the biologists to explicit its formula as a word on (A, T, G, C), and operations can be done on the gene (in vitro or in vivo) to modify it by insertions or deletions of some parts, at precise positions.