|  | 
 | ||||
|        | |||||
|  | |||||
| Prerequisite Concepts | Related Topics | ||||
| Markup processing control | |||||
Markup processing often encompasses a spectrum of complexity. OmniMark has a number of features that the programmer can employ to control how markup is processed. This topic explores some of those features, from the simplest element rules, to more complex examples using groups and markup sink functions.
 OmniMark is a rule-based language, and the main ingredient of many OmniMark programs is element rules. The body
      of every element rule in your program must take care of processing the element's content. The simplest way to
      accomplish this is to delegate the processing of content to other markup rules using %c:
  element "simple"
     output "%c"
    
 The above rule will reproduce the output and effect of all rules that its content will fire. Wrapping a part of
      the input in element <simple> will make no difference to the output.
    
 Alternatively, the rule could perform some actions before and after the processing of its content. For example,
      it could output something:
  element "parenthesized"
     output "("
     output "%c"
     output ")"
    
 The effect of having element <parenthesized> in the input is to wrap the result of
      processing the element content with parentheses. You can also redirect the content processing output to another
      destination, or discard it completely:
  element "redirect"
     using output as file (attribute "filename")
        output "%c"
  
  element "discard"
     suppress
    
 None of the rules above alters the result of content processing, they merely add to it or change its
      destination. The following rule subjects the output of %c to another round of processing:
  element "indent"
     repeat scan "%c"
     match any-text+ => line
        output "  " || line
     match "%n"
        output "%n"
     again
    
 The indent rule indents each line produced from its content. To accomplish this, it alters the result
      of %c but not the way this result gets produced. This is good: the rule fulfills its purpose with a
      localized code change. If you tried to accomplish the same effect in a single pass, you would have to modify every
      place where a line could be emitted within an <indent> element.
    
 The previous rule is an example of post-processing of content. It invokes other rules to process its content
        using %c, and then scans through their output. An alternative approach is to pre-process the content
        before invoking other rules by using #content instead of %c. Here are a few examples:
  element "redirect-content"
     using output as alternative-content-processor ()
        output #content
  
  element "distribute-content"
     using output as alternative-content-processor () & relaxng.validator against my-schema
        output #content
  
  element "really-discard"
     put #suppress #content
  
  element "half-marked-up"
     do markup-parse up-translate-content (#content)
        output "%c"
     done
      
 The first rule above, redirect-content, does not invoke any markup rules itself. Instead it sends
        its entire content off to alternative-content-processor, a markup sink function which may be
        imported from another module, to process it in any way it pleases.
      
 The rule distribute-content is similar but sends its content in parallel to two destinations, the
        alternative-content-processor to be processed and the relaxng.validator function to be
        validated at the same time.
      
 The really-discard rule is similar to the rule discard you have seen earlier, but where
        the latter discarded the output of content processing, really-discard discards the content processing
        itself. By directing its #content to #suppress, this rule avoids invoking any rules that would
        process its markup.
      
 Finally, the rule half-marked-up performs a pre-processing of its content through the function
        up-translate-content. For example, if the content of element <half-marked-up> was
This is one paragraph. This is <em>another</em> paragraph, as you can tell by the blank line preceding it.
 up-translate-content could convert this input to appear as
<para>This is one paragraph.</para> <para>This is <em>another</em> paragraph, as you can tell by the blank line preceding it.</para>
 After this pre-processing step, the rule half-marked-up applies do markup-parse and invokes
        regular content processing with %c. Notice that both the original element <em> and
        the newly introduced element <para> can be processed by the regular element rules,
        as if they were both present in the content from beginning. The function up-translate-content could
        be defined in a different module as follows:
  export markup source function
     up-translate-content (value markup source m)
  as
     do xml-parse scan "<up-translated>"
                    || wrap-implicit-paragraphs (split-data-content (m, #current-output))
                    || "</up-translated>"
        output "%c"
     done
  
  element "up-translated"
     output #content
      
 This function in turn relies on two others: split-data-content to separate the plain text from
        markup events which are sent directly to output of up-translate-content, and wrap-implicit-paragraphs to insert XML tags in the plain text.
  define string source function
     split-data-content (value markup source m,
                         value markup sink   events)
  as
     repeat
        output m take any*
        exit
  
      catch #markup-start event
        signal to events rethrow
      catch #markup-point event
        signal to events rethrow
      catch #markup-end event
        signal to events rethrow
     again
  
  
  define string source function
     wrap-implicit-paragraphs (value string source s)
  as
     repeat scan s
     match lookahead any-text
        output "<para>" || s take (any ** lookahead ("%n%n" | value-end)) || "</para>"
     match "%n"
        output "%n"
     again
    
Dividing the processing of your content into multiple steps is usually the best way to improve your program, as it is less intrusive and lets you reuse the common processing code. Still, sometimes neither post-processing nor pre-processing of content is enough and you need to alter the very way content is processed. The easiest way to achieve this is with groups.
 If you have an element whose content is completely different from the rest of your input, you will probably
        want to process it using a completely different set of rules from the regular one. To do this, simply put your
        %c into a using group scope:
  element "foreign"
     using group "process foreign elements"
        output "%c"
      
 If, on the other hand, the content model of your element is not completely unique, you may want to use both
        the common rules and the special ones:
  element "half-foreign"
     using group "process foreign elements" & #group
        output "%c"
      
 Keep in mind that for every element instance in your content, only a single element rule can fire:
        either a rule from your group "process foreign elements" or one of the common rules. That means you
        cannot have an unguarded element #implied rule in both groups, for example. But what if you actually
        want to perform both rules, because they both perform useful actions? One solution is to merge the body of the
        common rule into the other rule. If you would rather avoid the code duplication, you can apply the technique
        used by the distribute-content rule and send your content to be processed by both groups. You just
        need to define two markup sink functions that invoke the proper rules:
  define markup sink function
     common-content-processor (value string sink destination)
  as
     do markup-parse #current-input
        put destination "%c"
     done
  
  define markup sink function
     foreign-content-processor (value string sink destination)
  as
     using group "process foreign elements"
     do markup-parse #current-input
        put destination "%c"
     done
  
  element "distribute-half-foreign"
     using output as foreign-content-processor (#current-output)
                   & common-content-processor (#current-output)
        output #content
      
 Now that the content is processed by two groups of rules independently, each group is allowed to have an
        element #implied rule, and they can (and must) both fire for each element in the content.
    
 The reason #current-output is passed as argument to the two content-processor functions is
        to let them output into it. There will be a problem, however, if they should both do that for the same part of
        content, because the two outputs will then be merged together. For example, if neither group contains any data-content or translate rule, the content of input <distribute-half-foreign>Hello,
          World!</distribute-half-foreign> would be duplicated and the output would be Hello,
          World!Hello, World!.
      
 If you do need both outputs, instead of merging them as they come you may want to order them properly in your
        output by temporarily buffering one and outputting it after the other:
  element "distribute-half-foreign"
     local stream common-output
  
     open common-output as buffer
     using output as foreign-content-processor (#current-output)
                   & common-content-processor (common-output)
        output #content
     close common-output
     output common-output
      
 Alternatively, instead of storing the output of content processing you can use a markup-buffer to
        store your content before processing it. This lets you control both the order of your outputs and the order of
        processing:
  import "ommarkuputilities.xmd" unprefixed
  
  element "distribute-half-foreign"
     local markup-buffer my-content
  
     using output as foreign-content-processor (#current-output) & markup sink my-content
        output #content
     using output as common-content-processor (#current-output)
        output my-content
      
 Since the content is not processed in parallel any more there is no need to use the & operator.
        You can write this rule to the same effect without relying on the markup sink functions to wrap the rule
        invocations:
  import "ommarkuputilities.xmd" unprefixed
  element "distribute-half-foreign"
     local markup-buffer my-content
  
     using output as my-content
        output #content
  
     using group "process foreign elements"
     do markup-parse my-content
        output "%c"
     done
  
     do markup-parse my-content
        output "%c"
     done
| Prerequisite Concepts | Related Topics | 
Copyright © Stilo International plc, 1988-2010.