Analyzing a single data set using multiple RNA informatics
programs often requires a file format conversion between each
pair of programs, significantly hampering productivity. To
facilitate the interoperation of these programs, we propose
a syntax to exchange basic RNA molecular information. This RNAML
syntax allows for the storage and the exchange of information
about RNA sequence and secondary and tertiary structures. The
syntax permits the description of higher level information about
the data including, but not restricted to, base pairs, base
triples, and pseudoknots. A class-oriented approach allows us
to represent data common to a given set of RNA molecules, such
as a sequence alignment and a consensus secondary structure.
Documentation about experiments and computations, as well as
references to journals and external databases, are included
in the syntax. The chief challenge in creating such a syntax
was to determine the appropriate scope of usage and to ensure
extensibility as new needs will arise. The syntax complies with
the eXtensible Markup Language (XML) recommendations, a widely
accepted standard for syntax specifications. In addition to
the various generic packages that exist to read and interpret
XML formats, an XML processor was developed and put in the
open-source MC-Core library for nucleic acid and protein
structure computer manipulation.