Scanning is OmniMark's principal data processing mechanism. Scanning is a powerful mechanism that provides simple solutions to most data processing problems. As an OmniMark programmer you must learn to think of data processing problems first and foremost as scanning problems. Problems that you would solve by substring operations in other languages are generally handled by scanning in OmniMark.
Scanning is a process of progressing systematically through an input source and testing that data against one or more patterns. Whenever a pattern matches, a corresponding set of actions is executed.
OmniMark provides three scanning constructs:
submit
and find
rules,
repeat scan
, and
do scan
It also provides three scanning operators:
If you need to build a data structure from a data source, you do it by scanning the data and assigning data
captured by patterns to elements of your data structure:
process local string files variable repeat scan file "filelist.txt" match any-text+ => file-name "%n" set new files to file-name again repeat over files ; process the file again
But in most cases it is not necessary to build data structures. You can process your data directly as part of
the scanning operation:
process repeat scan file "filelist.txt" match any-text+ => file-name "%n" ; process the file file-name again
Because you can easily associate any processing code with a pattern matching event (the firing of a find
rule or a match
alternative) you can process most data directly as it streams. You can output
the result of your processing as part of responding to the event, knowing it will all be collected by the
current output scope and streamed to the proper destination.
If one string
of data could cause more than one pattern to match, the pattern that occurs first in the
scanning construct will fire, and the one that occurs later will not. This allows you to put more specific find
rules or match
alternatives before more general ones and have the general ones fire only if the
specific ones do not. The following two programs produce different output because of the order of their find
rules:
global integer word-count process submit "Mary had a little lamb" output "d" % word-count || "%n" find "had" output "*" find letter+ increment word-count find any
The program above prints "*4". The program below changes the order of the find
rules and produces a
different output.
global integer word-count process submit "Mary had a little lamb" output "d" % word-count || "%n" find letter+ increment word-count find "had" output "*" find any
This program prints 5.
You must always place more specific rules before more general rules that can match the same data, or the more specific rules will never fire.