do markup-parse

control structure

do markup-parse markup-source-expression


You can use do markup-parse to process a markup source, which may be produced either by one of OmniMark's internal parsers, or an external parser. do markup-parse provides a general interface for processing of markup streams.

The markup source argument for do markup-parse can be generated by external parsers of all kinds, and is not limited to parsed XML or SGML. There are two external parser libraries shipped with OmniMark: Xerces from the Apache project, and the RTF external parser.

do markup-parse operates in the same way as do sgml-parse and do xml-parse, except that it takes an already parsed markup source argument instead of the normal parameters of a do xml-parse or do sgml-parse invocation. In the example below, the markup source is the result of the external parser function xerces.xml and scan is a parameter of that function. Consult the OMXERCES library documentation for complete details on the parameters of the xerces.xml function.

  import "omxerces.xmd" prefixed by xerces.
     do markup-parse xerces.xml scan file "my.xml"
        output "%c"
  element #implied
     output "%c"

Processing and validation

Following the invocation of do markup-parse, OmniMark markup rules are fired, as when using an internal parser like do sgml-parse. Since string source data type is a subtype of markup source, a string source value can also be used as the argument of do markup-parse. In this case, only a single data-content rule is fired. If the argument is a more general markup source that contains markup events as well as data content, do markup-parse will fire the corresponding markup rule for each event.

The markup rule fired by a markup event will only have the information made available by the parser that originally produced the markup event, so you may not get the same information that you would have received from the internal parser. Note in particular that the XML and SGML error numbers reported by OmniMark's internal parsers are a product of those parsers, and will not be returned by an external parser. The error number reported by a markup error rule fired by an external parser will always be "0297". You will need to parse the error text to discover the specific error reported by the parser.