|
Introduction
This is a small but complex example of how OmniMark behaves when executing a context-translation.
Context-translation provides a powerful mechanism for translating generalized text documents to XML/SGML. Context-translation can also be used to add other types of structure to a document, beyond XML/SGML.
In a context-translation, element
rules and find
rules operate concurrently and asynchronously. Each type of rule operates in one of the two "processors", the pattern processor and the markup processor.
OmniMark selects rules and executes actions in one processor at a time.
The following example is designed to illustrate the interactions between the find
rules and the element
rules; it is a complete OmniMark program and can be executed. As will be illustrated, OmniMark can switch domains several times during the execution of the actions of a single rule. From the point of view of the OmniMark programmer, the find and output processors are executing concurrently. The asynchronicity is determined by the information requirements and information generating capabilities of the markup parser.
The object of this OmniMark program is to move generalized text information "across" the markup parser through buffers: from the find
rules to the element
rules and vice versa. It is not a practical example in and of itself, but forms the model for existing commercial OmniMark applications.
context-translate ; 1 ; 2 global stream main ; 3 global stream find-state ; 4 ; 5 element main ; 6 output find-state ; 7 set main to "Processing element 'main'%n" ; 8 output "%c" ; 9 ; 10 element alpha ; 11 output "%g(find-state)" ; 12 suppress ; 13 ; 14 element beta ; 15 suppress ; 16 ; 17 find "\doc" ; 18 set find-state to "Processing find 'doc'%n" ; 19 output "<main>" ; 20 set find-state to "%g(main)" ; 21 output "<beta>" ; 22 ; 23 find-start ; 24 output "<!doctype main [" ; 25 output "<!element main - o (alpha, beta)>" ; 26 output "<!element alpha - o empty>" ; 27 output "<!element beta - o empty>]>" ; 28 ; 29 markup-error ; 30
Given a document with the content "\doc
", OmniMark will execute this program as follows:
Line 1: The compiler sets this program up as a context-translation. OmniMark begins in the output processor. It first tries to select document-start
rules. As there are none it then attempts to obtain information from the internal markup parser. The markup parser has not received any information from the input processor so it then attempts to obtain information from OmniMark in the input processor.
Lines 24 through 28: The find-start
rule is the first rule to be selected. OmniMark executes each of its actions in the order they are written. These actions will output a small Document Type Declaration to the #markup-parser
stream.
The #markup-parser
stream is directed into the internal markup parser. The markup parser will parse the complete DTD and will be in a state ready for a document instance.
The markup parser asks OmniMark for more information. There are no more find-start
rules so OmniMark begins to read the document (assume a document opened on the command line). The first (and only in this example) text read from the document is "\doc
". OmniMark compares the text to the patterns in its available find
rules.
Line 18: The only find
rule is selected because its pattern matched the text of the document. OmniMark begins to execute the actions in the find
rule in the order written.
Line 19: OmniMark attaches the stream find-state to a buffer and sets the buffer's contents to "Processing find 'doc'%n
". The set
action accomplishes this by:
find
, find-start
, find-end
, and markup-error
rules may write to this stream.
set
then writes "Processing find 'doc'%n
" to the find-state stream.
set
closes the find-state stream.
Line 20: OmniMark writes <main>
to the #markup-parser
stream.
The markup parser recognizes <main>
as a valid start tag and passes it to OmniMark in the output processor.
Line 6: OmniMark examines the element
rules available and selects the main element
rule. OmniMark begins to execute the actions in the rule in the order they are written.:
Line 7: OmniMark writes the contents of the buffer attached to the find-state stream to the #main-output
stream. The #main-output
stream "belongs" to the output processor and find
, find-start
, and find-end
rules may not write to this stream.
The find-state buffer contains information placed in it by the find
rule on Line 19. The text Processing find 'doc'
is written to the #main-output
stream.
Line 8: OmniMark attaches the stream main to a buffer and sets the buffer's contents to "Processing element 'main'%n
". The set
action accomplishes this by:
find
, find-start
, and find-end
rules are not permitted to write to this stream.
set
then writes "Processing element 'main'%n
" to the main stream.
set
closes the main stream.
Line 9: OmniMark uses a %c
operator to process the content of the main element.
OmniMark attempts to obtain more information from the SGML parser. The markup parser has no more information to give so it attempts to obtain information from OmniMark in the input processor. OmniMark continues processing actions in the find
rule where it left off previously.
Line 21: OmniMark attaches the stream find-state to a buffer and sets its contents. The previous contents of the buffer are lost permanently. The new contents are copied from the stream main created on Line 8.
Line 22: OmniMark writes <beta>
to the #markup-parser
stream.
The markup parser now has information. The text <beta>
is recognized as a start tag. However, it is out of order. The markup parser expects to see a start tag for alpha prior to beta. The markup parser issues an error message to OmniMark.
Line 30: OmniMark selects the markup-error
rule. It has no actions, so OmniMark suppresses the error message.
The markup parser then fabricates an alpha start tag and queues the beta start tag. The markup parser passes the fabricated alpha start tag to OmniMark in the output processor.
OmniMark examines its element
rules.
Line 11: OmniMark selects the alpha element
rule and begins to execute its actions in the order they are written.
Line 12: OmniMark writes the contents of the find-state buffer. The contents of the find-state stream were created on Line 21. The text Processing element 'main'
is written to the #main-output
stream.
Line 13: OmniMark suppresses any further output from the alpha element and processes its content.
OmniMark attempts to obtain more information from the SGML parser. The markup parser is holding on to the beta start tag which it provides to OmniMark in the output processor.
Lines 15 and 16: OmniMark selects the beta element
rule and executes the suppress
operator. The suppress
operator temporarily removes all streams from the current output set and causes OmniMark to process the content of the beta element. OmniMark attempts to obtain further information from the markup parser.
The SGML parser has no more information so it attempts to obtain information from OmniMark in the input processor. OmniMark reads more of the input document and immediately receives an end of file indication. OmniMark immediately provides the markup parser with the end of file indication.
The markup parser expects a main end tag or an end of file indication. It accepts the end of file as a valid input and fabricates a main end tag which it passes to OmniMark in the output processor ahead of the end of file indication.
Line 9: OmniMark resumes the execution of the main element
rule immediately after the %c
operator.
Line 10: There are no more actions in this rule and there are no suspended rules. OmniMark attempts to obtain more information from the markup parser.
The markup parser returns the end of file indication which causes OmniMark to cease executing this program.
To completely understand the order of events requires some knowledge of exactly how the markup parser processes XML/SGML. Fundamentally, domain switching occurs when the markup parser receives a complete markup token, such as a complete start or end tag, complete processing instructions, complete external entity references, or data.
Each put
or output
action which writes into the #sgml
stream is acted upon by the markup parser immediately after the action ends. The parser determines whether it has sufficient information to return to the output processor or that it must obtain further information from the input processor.
Related Concepts
Context-translations: using XML/SGML as an intermediate form |
---- |