XML documents may contain CDATA marked sections. SGML documents may contain CDATA, RCDATA, IGNORE, and INCLUDE marked sections. OmniMark provides markup rules for handling all these types of marked sections.
CDATA and RCDATA marked sections serve to protect text from being misinterpreted as markup (start tags, end tags, entity references or declarations). These marked sections affect how the data is parsed by the SGML parser, but they do not usually affect the way that OmniMark processes the resulting data content.
marked-section cdata
and marked-section rcdata
rules can be used to identify content that was wrapped in a CDATA or RCDATA marked section.
It is very important to understand that the presence or absence of marked-section
rules does not affect how marked sections are treated by the SGML parser. They only determine how the SGML parser presents the resulting text to OmniMark.
A similar set of statements applies to CDATA and RCDATA marked sections as applies to IGNORE marked sections. The major difference is that the "default" processing for CDATA and RCDATA marked section is to treat their text content as data content, and not to discard it.
marked-section cdata
or marked-section rcdata
rules, then OmniMark treats the text resulting from these marked sections as if the text resulted from ordinary data content. In other words, OmniMark does not detect the boundaries between the text originating from inside the marked section and the text originating from outside the marked section.
marked-section cdata
rule may be selected for a CDATA marked section. That is, either there must only be one marked-section cdata
rule or, if there is more than one such rule, each must have a condition. Similarly, only one marked-section rcdata
rule may be selected for an RCDATA marked section. It is an error for more than one marked-section cdata
or marked-section rcdata
rule to be selected for a CDATA or an RCDATA marked section.
%c
operator captures the text of a CDATA or RCDATA marked section. Either %c
or suppress
must be used exactly once in a marked-section cdata
or marked-section rcdata
rule. All modifiers supported by %c
can be used on a %c
operator in a marked-section cdata
or marked-section rcdata
rule.
marked-section
rule in the OmniMark program.
sgml-out
action determines what happens to record ends in the text of a CDATA and RCDATA marked section.
IGNORE marked sections appear to an OmniMark program in the same way as SGML comments do, except that they are processed using a marked-section ignore
rule rather than an sgml-comment
rule.
OmniMark programmers should note that, in keeping with the provisions of clause 10.4.1 of the SGML standard (ISO 8879:1986), all pairs of "<[" and "]]>" within an IGNORE marked section are matched and treated as text. This means that any marked sections nested within an IGNORE marked section, including the opening and closing delimiters, are treated as part of the text of the IGNORE marked section.
The text of an IGNORE marked section consists of all the characters between the DSO delimiter following the status keyword specification, and the marked section end (that is, between the "[" following the keyword IGNORE and the "]]>"). The text does not include the surrounding delimiters, but does include any record ends or white space within the marked section.
Any SGML comment in the header of an IGNORE marked section is processed prior to the processing of the IGNORE marked section.
Only marked sections in the document instance are available for processing by an OmniMark program. Marked sections in the DTD are always ignored, whether or not there is any marked-section
rule in the OmniMark program.
The setting of the sgml-out
action determines what happens to record ends in the text of an IGNORE marked section.
The presence of marked-section ignore
rules affects how translate
rules match text in and around an IGNORE marked section.
SGML comments and ignore
, cdata
, and rcdata
marked sections are all processed similarly. However, include
marked sections require quite a different approach. Instead of having one rule to process an include
marked section, OmniMark provides two: one for processing the start of a marked section and one for the end. This split is necessary because, unlike other types of marked sections, an include
marked section can start in the context of one element and end in another, and so can overlap the hierarchical structure that ties the components of a parsed SGML document together.
This kind of overlapping cannot happen with ignore
, cdata
, and rcdata
marked sections because they inhibit the recognition of other markup, including start and end tags, within their text. An important consequence of this is that the whole of the text of an ignore
, cdata
, or rcdata
marked section is processed with one set of output streams (as used by the output
action and as available using the #current-output
stream set) and inherits the stream destinations and stream modifiers from the element
or data-content
rule that processes the surrounding content.
The contents of an include
marked section, can be part of one or more elements, the element
and data-content
rules for which each may specify different output destinations and stream modifiers. To avoid all the complexity and user confusion that could result from trying to "merge" the specifications of the rules for include
marked sections and the applicable element
and data-content
rules, include
marked section rules only apply to the start and end of an include
marked section. The include
marked section's rules have no direct influence on the processing of the marked section's content. The two rules are the marked-section include-start
and marked-section include-end
rules.
The OmniMark program can influence the processing of the content of an include
marked section by setting global variables and testing them in element
and data-content
rules, so that those rules can detect when they occur in an include
marked section.
This is an example of an INCLUDE marked section overlapping the element structure of a document:
<title>Part of the title. <![INCLUDE[More of the title. <p>The first paragraph. <p>Part of the second paragraph. ]] More of the second paragraph.