|
|||||
Pattern and markup processors | |||||
Related Syntax |
The OmniMark language has extensive information processing capability built in. This functionality is packaged into two processors, the pattern processor and the markup processor. In other languages you would have to code this functionality yourself.
The pattern processor provides pattern-matching functionality, and operates on a stream of bytes (which may be text or binary data). The pattern processor works with find rules, detecting an event whenever a pattern defined by a find rule occurs in the input stream. It then fires that rule.
The markup processor provides markup recognition for markup languages created using XML or SGML. The markup processor works with markup rules, detecting an event each time an element or other structure defined by markup occurs in the input stream. It then fires the markup rule associated with the event detected.
Note that the markup processor detects and reports the elements defined by markup, not the text of the markup itself. Thus, in the string "Mary had a little <animal>lamb</animal>" the markup processor will fire the markup rule element animal
and will report that the data content of the element is "lamb". It will not report that it found the markup strings "<animal>...</animal>". In XML, you can safely assume that it did in fact encounter that markup, but in SGML, which allows for shortened forms of tags and for complete omission of tags in some cases, several different combinations of markup can represent the same element and cause the same rule to fire.
If you wanted to process the text of the markup, as opposed to the structure defined by the markup, you would use the pattern processor. In fact, you can write your own markup processor using the pattern processor. This is useful for converting markup that is not compatible with XML or SGML into XML or SGML form.
How do you use the pattern and markup processors? Simply direct input to the appropriate processor. You direct input to the pattern processor with submit
. For example, the following code fragment sends a file called names.txt to the pattern processor.
submit file "names.txt"
The OmniMark actions do sgml-parse
and do xml-parse
direct input to the markup processor. For example, the following code fragment sends a file called myfile.xml to the XML parser of the markup processor.
do xml-parse document scan file "myfile.xml" output "%c" done
You can also direct input to the pattern or markup processors using OmniMark's aided translation types.
Once you direct input into one of the processors, OmniMark processes the entire input, firing rules as they occur. If, in responding to an event, you perform an action that submits new input to one of the processors, the current input is suspended, and the new input is processed. When the new input has finished processing, OmniMark resumes processing the original input.
This feature has many uses. For instance, you could read a list of names of files containing XML markup, and open and process each file in turn:
process-start submit file "names.txt" find [any except white-space]+=> filename do xml-parse document scan file "filename" output "%c" done find white-space+ ;absorb leftover white space in names.txt element ...
(Note that the find rule for file names is pretty rudimentary. It's fine if you know the structure of the data you're reading, but don't take this as a general method for identifying file names!)
Related Syntax do sgml-parse do xml-parse submit |
---- |