|
|||||
XML to HTML conversions | |||||
------ |
Conversion of XML information for web consumption is one of the most common uses of OmniMark. Some fundamental techniques include:
You can design your OmniMark program to process input regardless of whether it is contained in flat files (DTDs and instances), or is derived from more complex storage mechanisms (having the XML information split across multiple files, or split across database fields).
The following sections describe a series of sample programs designed to translate a "book" from XML into HTML in phases. Each sample builds on the previous one, so that the final example contains a complete solution that implements all the described types of formatting and linking. The sample programs include:
Each of the sample programs can be run as follows:
omnimark -s [sample-program].xom mart.xml -of [sample-output-file].htm
Describing the basic mapping from XML to HTML in OmniMark is extremely simple and straightforward. This sample uses a book metaphor for the incoming information. (It also assumes the HTML information will be created in the same order as the XML information is stored). A series of element rules is all that is necessary to perform the basic formatting.
Here is a simple example of the OmniMark element rules used to process the book's glossary:
element glossary output "%n%n<h2>Glossary</h2>%n%n" _ "<ul>%n%c" _ "</ul>%n" element term output "<li>%c:" element defn output " %c%n"
The respective element rules are "fired" as OmniMark processes the XML data. The rules fire in a nested fashion, just as the elements are nested in the XML document. So, the following XML data:
<glossary> <term>trigger<defn>The thing that makes the thing shoot. <term>laser doohicky<defn>The part that shoots. <term>battery chamber<defn>The thing that holds the batteries. <term>handle<defn>The thing you hold. </glossary>will be converted by OmniMark to the following HTML:
<h2>Glossary</h2> <ul> <li>trigger: The thing that makes the thing shoot. <li>laser doohicky: The part that shoots. <li>battery chamber: The thing that holds the batteries. <li>handle: The thing you hold. </ul>
Explicit links in XML are "hardcoded", and are directly translatable to HTML hypertext. However, note that using explict links in your XML can lead to many link management problems down the road. Hardcoded linking enforces navigation paths, and gives no meta information to any intelligent applications to do more interesting work.
The following OmniMark rules handle the explicit linking by translating directly to the HTML counterpart. The OmniMark program also uses referents in order to easily move around chapter title information. (Hardcoded references to a chapter have no knowledge of the title of the chapter they point to; OmniMark lets you insert that information regardless of whether the link is a forward or backward reference.)
element xref output '<a href="#exp-%v(idref)">' || referent "exp-%v(idref)" || '</a>%c' element chapter increment chap-no output '<a name="exp-%v(id)"></a>%n%c' element title when parent is chapter local stream chap-title set chap-title to "Chapter %d(chap-no), %c" output "<h2>%g(chap-title)</h2>%n" using attribute id of parent set referent "exp-%v(id)" to chap-title
Implicit linking allows the final hypertext links to be created "in the eyes of the application." In other words, OmniMark lets you define easily how you want the links created, based on content rather than having the linking hardcoded in the source data.
Here is an example of OmniMark code that links references to part names with glossary entries. Because the information set (a book), and the implied requirements of the final deliverable, were so simple, referents weren't necessary. However, referents can be used when you want to write a link before you have read the information you need to create the link.
element term local stream term set term to "%c" ; content linking output '<a name="content-%g(term)">' || "<li>%g(term):" element part local stream part set part to "%c" output '<a href="#content-%g(part)">' || part || '</a>'
Although this samples uses only one input and one output file, this type of linking also can be used in a multiple-object input and multiple-object output environment.
This sample describes how to create a table of contents for the chapters of a book. The first step would be to define a referent and open it as a stream:
global stream toc process-start open toc as referent "toc"
Then, at the point you'd like the table of contents to appear, output the referent (even though its value hasn't yet been assigned):
element author output "by: %c%n"
This is where the table of contents will go:
output "<hr><h2>Table of Contents</h2>%n" || '<ul>%n' || referent "toc" || '</ul>'
Then append the relevant linking to the table of contents referent whenever you happen to encounter it:
element title when parent is chapter local stream chap-title set chap-title to "Chapter %d(chap-no), %c" output "<h2>%g(chap-title)</h2>%n" using attribute id of parent do set referent "exp-%v(id)" to chap-title put toc '<li><a href="#exp-%v(id)">%g(chap-title)</a>%n' done
Linking based on implied structure
Linking based on implied structure in this type of conversion is fairly uncomplicated, especially when dealing with a book metaphor for the XML input file. This sample describes how to create a secondary table of contents (implied by the intended audience for each book chapter).
The first step is to specify where you want the secondary table of contents to appear by writing a referent to that location (even though the value hasn't been determined yet):
element author output "by: %c%n"
This is where we want the table of contents to go:
output "<hr><h2>Table of Contents</h2>%n" || '<ul>%n' || referent "toc" || '</ul>'
This is where the contents by audience should go:
output "<hr><h2>Contents by Audience</h2>%n" || '<ul>%n' || referent "cba" || '</ul>'
Then, for each new audience, you would write data (specific to that audience) to the original referent (the content by audience listing). For each chapter encountered, you would write the necessary linking information to the referent associated with that chapter's audience. OmniMark takes care of putting the appropriate information in the right place.
element audience local stream aud set aud to "%c" output "<b>For:</b> <i>%g(aud)</i>%n"
Handle contents by audience and give it a starting value:
do when referents hasnt key "audtype-%g(aud)" set referent "audtype-%g(aud)" to "<li>%g(aud): "
Put a place for this list of audience references in the contents by audience referent. Or, if a referent for this audience exists, just append the desired delimiter (a semicolon in this case).
put cba referent "audtype-%g(aud)" else set referent "audtype-%g(aud)" to referents ^ "audtype-%g(aud)" || "; " done
Next write the necessary linking information to the end of that audience's referent:
using attribute id of parent set referent "audtype-%g(aud)" to referents ^ "audtype-%g(aud)" || '<a href="#exp-%v(id)">%g(chap-title)</a>%n'
In many XML implementations, a single XML document can be extremely large. This is commonly the case when a book metaphor is used to model and represent information. When delivering to the Web, however, you usually won't want to deliver the paper equivalent of a 500-page document as a single HTML file (even when it makes sense to author it that way internally).
To handle the relatively slow speeds of the Internet, you need to "chunk" the original information into bite-sized pieces for downloading to the final client application. Different applications might require different levels of chunking (chapter, section, volume, etc), depending on the tradeoffs between connection speed and relative usability of the chunks.
The following sample code creates a new file for every chapter encountered. It also adds forward and backward linking to improve navigation. The "chunking" is done easily: whenever a new chapter is encountered, a new file is created for that chapter.
element chapter increment chap-no set file "chap-%d(chap-no).htm" with referents-allowed to '%c'
For chunking out the chapters, this is the only change necessary. All remaining rules that correspond to elements contained within chapters don't need to be changed: their output will, by default, go into the new file.
A similar method would be used for a book glossary; no changes would be necessary to the rules that correspond with a glossary's subelements:
element glossary set file "gloss.htm" to "<h2>Glossary</h2>%n%n" _ "<ul>%n%c" _ "</ul>%n"
The linking mechanisms also need to be changed slightly to refer to the new file linking (and not just locations within a single file).
Final sample HTML output files
The six sample XML conversion programs produce the following HTML files.
------ |
---- |