XML/SGML comments and marked sections

SGML and XML comments are typically intended for the human reader of a document. This means that programs are not, in general, interested in comments. The exception occurs when the output of a program is itself intended to be read by humans and it is, therefore, appropriate to copy the comments from the input document to the output. An important example of the latter type of processing is converting an SGML or XML document to another form of SGML or XML document (for example, from one DTD to another) or enhancing a document with further elements, data content, and attributes. Copying the SGML and XML comments over to the output is most important when doing a "near identity transformation", when the "converted" document is identical to the source document, with only some parts or some aspects changed.

Marked sections are used for a variety of purposes:

  • INCLUDE and IGNORE marked sections are typically used for determining which parts of the text of a document instance are to be passed to an SGML or XML parser for processing, and which parts are to be ignored (and not passed on to the processing programs) by the parser.
  • CDATA and RCDATA marked sections are used to indicate that part of the text of an SGML or XML document is just text. Apart from the markup that ends the marked section and entity references in RCDATA marked sections, everything is treated as text.
  • Most programs that process SGML or XML documents are interested in the text and the element structure of the documents, and are not interested in how the SGML or XML parser decided what is what. For example, most processing programs are quite content that IGNORE marked sections are ignored. However, as in the case of SGML and XML comments, programs whose output is an SGML or XML document, especially one that closely matches the input document, will often want to preserve the marked section information, and will want to preserve the text inside IGNORE marked sections (treating it, in effect, as an SGML or XML comment).
  • OmniMark allows you to identify and process SGML and XML comments and marked sections, including the text of comments and IGNORE marked sections. You can select which types of marked section are to be specially processed, and whether or not SGML and XML comments are to be processed.
  • Depending on the type of marked section, either the marked section and the text it contains are processed by a single OmniMark rule or, as in the case of INCLUDE marked sections, the start and end of the marked section are processed by separate rules.

Prerequisite Concepts
Related Topics