do xml-parse - OmniMark Keyword

do xml-parse
Full Description

Syntax

  do xml-parse document (with id-checking Boolean-expression)? (creating xml-dtds key keyname)?
     scan (input source | (input input-function-call))
     action+
  done

  do xml-parse instance (with document-element element-name)?
     with (xml-dtds key key | current xml-dtds)
     (with id-checking Boolean-value)?
     scan (input source | (input input-function-call))
  done

Purpose

You can invoke the markup processor, and its XML parser, with do xml-parse. To invoke the markup parser with its SGML parser, use do sgml-parse. do xml-parse initiates a code block, ending with done, in which you must do the following:

Identify the type of data to be processed, document or instance.
Identify the source of this data, a stream or an input function.
Perform any processing that should take place at the start of the data.
Perform exactly one parse continuation operator (%c or suppress) to initiate processing of the data by markup rules.
Perform any processing that should take place at the end of the data.

The simplest use of do xml-parse is to process a complete XML document:

  do xml-parse document scan file "my-xml.xml"
     output "%c"
  done

This assumes that the file "myxml.xml" contains an XML document. You will often find that the DTD and the instance you want to process are in two different files. The simplest way to handle this is:

  do xml-parse document scan file "my-dtd.dtd" || file "my-xml.xml"
     output "%c"
  done

But suppose you have 20 instances to process, all of which use the same DTD. It is wasteful to parse the same DTD 20 times. To avoid doing this you can pre-compile the DTD and place it on the built-in shelf xml-dtds:

  do xml-parse document
     creating xml-dtds key "my-dtd"
     scan file "my-dtd.dtd"
     suppress
  done

You can then process each instance in turn. The following code assumes you have placed the file names of the instances on a shelf called "my-instances":

  repeat over my-instances
     do xml-parse instance
        with xml-dtds key "my-dtd"
        scan file my-instances
        output "%c"
     done
  again

In some cases you may wish to parse a partial instance, that is, a piece of data comprising an element from a DTD which is not the doctype element of that DTD. In this case you can specify the element to be used as the effective doctype for parsing the data:

  do xml-parse instance
     with xml-dtds key "my-dtd"
     with document-element "lamb"
     scan file "partinst.xml"
     output "%c"
  done

The element's start and end tags can be present, or they can be omitted if the element allows. XML comments, processing instructions, and even marked sections can precede and follow the element's start and end tags, but anything else (particularly other elements, data, entity references, or USEMAP declarations) is an error.

By default, OmniMark checks all XML IDREF attributes to make sure they reference a valid ID. This checking may not be appropriate in processing a partial instance. It also takes time. You can turn this checking on and off using with id-checking followed by a Boolean expression. The following code will parse the specified document without checking IDREFs:

  do xml-parse document scan file "my-xml.xml" with id-checking false
     output "%c"
  done

When parsing a document, markup rules are fired as follows (if specified in your code):

xml-declaration-end
dtd-start
dtd-end
prolog-end
general markup rules
epilog-start

When parsing an instance part only general markup rules are fired.

If there are errors in the XML declaration or prolog (DTD), then the processing of the content of the do xml-parse action will terminate and execution is resumed in the actions following the parse continuation operator in the body of the do xml-parse. However, the amount of input read is undefined in this situation. That is, OmniMark may choose to consume the entire input source, it may stop reading the input immediately, or it may do something in between.