XML/SGML to TeX-like languages

This example shows how a simple document can be translated into a TeX-like markup language. Note the translate rule used with the break-width and replacement-break declarations. This is a standard way of processing data content without worrying about how the lines are broken originally.

Old part numbers are printed in lowercase only. New part numbers are printed with uppercase letters. The title is not broken or stripped. Paragraph text is stripped, and this stripping is inherited by the "part" and "old part" elements.

A typical document may look like this:

  <!doctype doc [
  <!element doc        o o (title, para+)>
  <!element title      o o (#pcdata)>
  <!element para       - o (#pcdata|part|old-part)*>
  <!element part       - - (#pcdata)>
  <!element old-part   - - (#pcdata)>
  ]>
  Acme Llama and Haggis Supply Parts catalog,
  Fall, 1973
  <para>
  Our new stock includes three new Peruvian llamas (ask for
  <part/lL-33-864/). We have also located a new haggis supplier in
  Singapore (<part/gG-33-865/), and are no longer carrying
  <old-part/Yh5-33-863A/, as our supplier in the Maldives is no longer in
  business. This change should handle some of your requests.
  <para>
  As usual,
  we at Acme are looking forward to meeting your needs this fall.

The output, which can be sent to a formatter, appears as follows:

  \title{Acme Llama and Haggis Supply Parts catalog, Fall, 1973}
  
      Our new stock includes three new
  Peruvian llamas (ask for
  \part{LL-33-864}). We have also located
  a new haggis supplier in Singapore
  (\part{GG-33-865}), and are no longer
  carrying \part{yh5-33-863a}, as our
  supplier in the Maldives is no longer in
  business. This change should handle some
  of your requests.
  
      As usual, we at Acme are looking
  forward to meeting your needs this fall.

The formatter instruction \part appears in lowercase despite the u modifier on the %c operator in the part element rule. This is because it is part of a format string and not copied data content.

%c modifiers do not usually apply to text explicitly output by the OmniMark program. The exceptions are %sn, %st, and %s_. The s modifier in a %c causes all the strippable white space to be eliminated. The s modifier in a %t, %n, or %_ format item marks that item as strippable.

  element "doc"
    output "%c"
  
  element "para"
    output "%n" when previous is "para"
    output "%_%_%_%_%sc%n"
  
  element "part"
    output "\part{%uc}"
  
  element "old-part"
    output "\part{%lc}"
  
  element "title"
    output "\title{%hc}%n%n"
  
  translate "%n"
    output "%/%s_"
  
  break-width 40
  
  replacement-break "%_" "%n"