DTD handling

When OmniMark's built-in SGML or XML parser is invoked, it will normally expect to parse and compile the document type definition (DTD) before it begins parsing and validating the document instance markup. This will happen automatically, unless well-formed XML parsing is requested.

In some situations this default behaviour may not be desired. When parsing many markup documents referring to the same DTD, for example, it is unnecessary to re-parse the DTD for every document. In order to help with these situations, OmniMark provides access to the DTD in its compiled form through the built-in shelf #current-dtd.

Compiling a DTD

In order to compile a DTD for reuse, one can capture the value of #current-dtd during one parse and store it in any shelf of type dtd. The built-in shelves sgml-dtds and xml-dtds can serve this purpose.

     do sgml-parse document scan file "input-with-dtd-1.sgml"
        put #suppress #content
        set new sgml-dtds{"dtd-1"} to #current-dtd
     done

The same effect can be achieved by specifying the creating clause of the do sgml-parse or do xml-parse action. The parser will terminate at the end of the SGML document prolog, create a compiled DTD, and store it in the specified keyed item of the sgml-dtds or xml-dtds shelf.

     do sgml-parse document creating sgml-dtds{"dtd-1"} scan file "input-with-dtd-1.sgml"
        put #suppress #content
     done

Using a compiled DTD

Once a compiled DTD is obtained, it can be assigned to #current-dtd in another parse, where it will be used to parse and validate its input. The input document does not need to contain any DTD itself; if it does it will be parsed but not used for instance validation.

     do sgml-parse document scan file "instance-of-dtd-1.sgml"
        set #current-dtd to sgml-dtds{"dtd-1"}
        output "%c"
     done

The clauses with sgml-dtds and with xml-dtds of the do sgml-parse and do xml-parse actions, respectively, can be used for the same purpose.

Selecting the appropriate compiled DTD

The DTD to which a document instance conforms is not always known in advance. When processing a collection of documents of diverse types, it may be necessary to assign #current-dtd to each document after its parsing has started. This is only allowed up to the point when the dtd-end is reached, or until the end of an active document-type-declaration rule. The following example rule selects the appropriate DTD for each input document at the moment it is referenced, based on its system identifier. The first time a DTD is encountered, it will be compiled and stored. On each subsequent reference to the same DTD, it will be reused.

  external-text-entity #dtd when entity is system
     do when sgml-dtds has key "%eq"
        set #current-dtd to sgml-dtds{"%eq"}
        output "<!element %g(#doctype) - o empty>"  ; a dummy DTD that will be ignored
  
     else
        set new sgml-dtds{"%eq"} to sgml-dtd cast #current-dtd
        output file "%eq"
     done

The rule above emits the sole <!element %g(#doctype) declaration in order to satisfy the SGML parser which requires a valid DTD. Since this DTD will be replaced by the one assigned to #current-dtd, its content doesn't matter as long as it's valid and declares an element whose name matches #doctype.