XML to HTML conversions - OmniMark Sample

XML to HTML conversions

------

Creating HTML from XML

Conversion of XML information for web consumption is one of the most common uses of OmniMark. Some fundamental techniques include:

basic formatting;
explicit hypertext generation (such as handling XREF-style markup);
implicit hypertext generation (based on content);
linking based on structure (creating a table of contents);
linking based on implied structure (creating a secondary table of contents); and
chunking (the splitting of your XML data across a number of HTML files).

You can design your OmniMark program to process input regardless of whether it is contained in flat files (DTDs and instances), or is derived from more complex storage mechanisms (having the XML information split across multiple files, or split across database fields).

Sample programs

The following sections describe a series of sample programs designed to translate a "book" from XML into HTML in phases. Each sample builds on the previous one, so that the final example contains a complete solution that implements all the described types of formatting and linking. The sample programs include:

format.xom. This program does simple formatting in HTML.
explink.xom. This program does the formatting, but also handles explicit links (for example, via <xref>).
implink.xom. This program creates links based on content information (implicit linking on part names).
expstruc.xom. This program makes a table of contents (links on explicit structure).
impstruc.xom. This program makes a second table of contents based on audience (links on implied structure).
chunk.xom. This program "chunks" the input into separate files.

Each of the sample programs can be run as follows:

    omnimark -s [sample-program].xom mart.xml -of [sample-output-file].htm

Basic formatting

Describing the basic mapping from XML to HTML in OmniMark is extremely simple and straightforward. This sample uses a book metaphor for the incoming information. (It also assumes the HTML information will be created in the same order as the XML information is stored). A series of element rules is all that is necessary to perform the basic formatting.

Here is a simple example of the OmniMark element rules used to process the book's glossary:

  element glossary
     output "%n%n<h2>Glossary</h2>%n%n" _
                "<ul>%n%c" _
                "</ul>%n"

  element term
     output "<li>%c:"

  element defn
     output " %c%n"

The respective element rules are "fired" as OmniMark processes the XML data. The rules fire in a nested fashion, just as the elements are nested in the XML document. So, the following XML data:

  <glossary>
  <term>trigger<defn>The thing that makes the thing shoot.
  <term>laser doohicky<defn>The part that shoots.
  <term>battery chamber<defn>The thing that holds the batteries.
  <term>handle<defn>The thing you hold.
  </glossary>

will be converted by OmniMark to the following HTML:

  <h2>Glossary</h2>

  <ul>
  <li>trigger: The thing that makes the thing shoot.
  <li>laser doohicky: The part that shoots.
  <li>battery chamber: The thing that holds the batteries.
  <li>handle: The thing you hold.

  </ul>

Sample Files

XML to HTML conversions: basic formatting
XML to HTML conversions: basic formatting HTML output

Explicit hypertext

Explicit links in XML are "hardcoded", and are directly translatable to HTML hypertext. However, note that using explict links in your XML can lead to many link management problems down the road. Hardcoded linking enforces navigation paths, and gives no meta information to any intelligent applications to do more interesting work.

The following OmniMark rules handle the explicit linking by translating directly to the HTML counterpart. The OmniMark program also uses referents in order to easily move around chapter title information. (Hardcoded references to a chapter have no knowledge of the title of the chapter they point to; OmniMark lets you insert that information regardless of whether the link is a forward or backward reference.)

  element xref
    output '<a href="#exp-%v(idref)">'
         || referent "exp-%v(idref)"
         || '</a>%c'

  element chapter
    increment chap-no
    output '<a name="exp-%v(id)"></a>%n%c'

  element title when parent is chapter
  local stream chap-title
    set chap-title to "Chapter %d(chap-no), %c"
    output "<h2>%g(chap-title)</h2>%n"
    using attribute id of parent
    set referent "exp-%v(id)" to chap-title

Sample Files

XML to HTML conversions: explicit hypertext
XML to HTML conversions: explicit hypertext HTML

Implicit hypertext

Implicit linking allows the final hypertext links to be created "in the eyes of the application." In other words, OmniMark lets you define easily how you want the links created, based on content rather than having the linking hardcoded in the source data.

Here is an example of OmniMark code that links references to part names with glossary entries. Because the information set (a book), and the implied requirements of the final deliverable, were so simple, referents weren't necessary. However, referents can be used when you want to write a link before you have read the information you need to create the link.

  element term
     local stream term
     set term to "%c"                                     ; content linking
     output '<a name="content-%g(term)">'
          || "<li>%g(term):"

  element part
  local stream part
  set part to "%c"
     output '<a href="#content-%g(part)">'
          || part
          || '</a>'

Although this samples uses only one input and one output file, this type of linking also can be used in a multiple-object input and multiple-object output environment.

Sample Files

XML to HTML conversions: implicit hypertext
XML to HTML conversions: implicit hypertext HTML

Linking based on structure

This sample describes how to create a table of contents for the chapters of a book. The first step would be to define a referent and open it as a stream:

              global stream toc
              process-start
                    open toc as referent "toc"

Then, at the point you'd like the table of contents to appear, output the referent (even though its value hasn't yet been assigned):

             element author
               output "by: %c%n"

This is where the table of contents will go:

               output "<hr><h2>Table of Contents</h2>%n"
                   || '<ul>%n'
                   || referent "toc"
                   || '</ul>'

Then append the relevant linking to the table of contents referent whenever you happen to encounter it:

               element title when parent is chapter
               local stream chap-title
               set chap-title to "Chapter %d(chap-no), %c"
               output "<h2>%g(chap-title)</h2>%n"
               using attribute id of parent
                 do
                   set referent "exp-%v(id)" to chap-title
                   put toc '<li><a href="#exp-%v(id)">%g(chap-title)</a>%n'
                 done

Sample Files

XML to HTML conversions: HTML linking based on structure
XML to HTML conversions: linking based on structure

Linking based on implied structure

Linking based on implied structure in this type of conversion is fairly uncomplicated, especially when dealing with a book metaphor for the XML input file. This sample describes how to create a secondary table of contents (implied by the intended audience for each book chapter).

The first step is to specify where you want the secondary table of contents to appear by writing a referent to that location (even though the value hasn't been determined yet):

  element author
    output "by: %c%n"

This is where we want the table of contents to go:

    output "<hr><h2>Table of Contents</h2>%n"
         || '<ul>%n'
         || referent "toc"
         || '</ul>'

This is where the contents by audience should go:

    output "<hr><h2>Contents by Audience</h2>%n"
         || '<ul>%n'
         || referent "cba"
         || '</ul>'

Then, for each new audience, you would write data (specific to that audience) to the original referent (the content by audience listing). For each chapter encountered, you would write the necessary linking information to the referent associated with that chapter's audience. OmniMark takes care of putting the appropriate information in the right place.

  element audience
      local stream aud
      set aud to "%c"
      output "<b>For:</b> <i>%g(aud)</i>%n"

Handle contents by audience and give it a starting value:

      do when referents hasnt key "audtype-%g(aud)"
          set referent "audtype-%g(aud)" to "<li>%g(aud): "

Put a place for this list of audience references in the contents by audience referent. Or, if a referent for this audience exists, just append the desired delimiter (a semicolon in this case).

          put cba referent "audtype-%g(aud)"
          else

          set referent "audtype-%g(aud)" to referents ^ "audtype-%g(aud)" || "; "
               done

Next write the necessary linking information to the end of that audience's referent:

               using attribute id of parent
               set referent "audtype-%g(aud)"
                 to referents ^ "audtype-%g(aud)"
                             || '<a href="#exp-%v(id)">%g(chap-title)</a>%n'

Sample Files

XML to HTML conversions: implied structure HTML
XML to HTML conversions: linking based on implied structure

Chunking of XML

In many XML implementations, a single XML document can be extremely large. This is commonly the case when a book metaphor is used to model and represent information. When delivering to the Web, however, you usually won't want to deliver the paper equivalent of a 500-page document as a single HTML file (even when it makes sense to author it that way internally).

To handle the relatively slow speeds of the Internet, you need to "chunk" the original information into bite-sized pieces for downloading to the final client application. Different applications might require different levels of chunking (chapter, section, volume, etc), depending on the tradeoffs between connection speed and relative usability of the chunks.

The following sample code creates a new file for every chapter encountered. It also adds forward and backward linking to improve navigation. The "chunking" is done easily: whenever a new chapter is encountered, a new file is created for that chapter.

  element chapter
     increment chap-no
     set file "chap-%d(chap-no).htm" with referents-allowed to '%c'

For chunking out the chapters, this is the only change necessary. All remaining rules that correspond to elements contained within chapters don't need to be changed: their output will, by default, go into the new file.

A similar method would be used for a book glossary; no changes would be necessary to the rules that correspond with a glossary's subelements:

  element glossary
     set file "gloss.htm" to "<h2>Glossary</h2>%n%n" _
      "<ul>%n%c" _
      "</ul>%n"

The linking mechanisms also need to be changed slightly to refer to the new file linking (and not just locations within a single file).

Sample Files

XML to HTML conversions: chunking
XML to HTML conversions: chunking HTML output

Final sample HTML output files

The six sample XML conversion programs produce the following HTML files.

Sample Files

HTML output files: chapter 1
HTML output files: chapter 2
HTML output files: chapter 3
HTML output files: chapter 4
HTML output files: glossary

------

----

[CONTENTS] [CONCEPTS] [SYNTAX] [LIBRARIES] [SAMPLES] [ERRORS] [INDEX]

Generated: April 21, 1999 at 2:01:38 pm
If you have any comments about this section of the documentation, send email to [email protected]