|
|||||
Repetition and optionality | |||||
Prerequisite Concepts |
Repetition and optionality are indicated in OmniMark patterns with the same occurrence indicators used in SGML.
?
: follows an optional pattern (one that may occur or may be omitted)
*
: follows an optional pattern that may be repeated (the pattern need not occur at all, can occur once, or can occur several times)
+
: follows a required pattern that may be repeated (the pattern must occur at least once, and can occur several times)
For example,
letter+
represents one or more letters. Since OmniMark always matches the longest possible sequence of characters described by a pattern, this simple pattern can be used to match words.
A second example: suppose a word processor's instructions consist of words preceded by a backslash. The following pattern recognizes a sequence of such instructions:
("\" letter+)*
Repetition must be used cautiously. It is fairly easy for a programmer to inadvertently write a pattern that matches more input than is intended. For instance, the pattern in the following rule header matches the remainder of the document:
find any+
OmniMark does not consider any following subpattern when processing a repeated subpattern. Thus, a find
rule beginning with the following rule header can never be selected:
find any* "!"
The any*
subpattern matches all unprocessed input and, of course, no exclamation point can occur after the end of the document. The desired effect can be achieved by one of the following alternatives:
find [any except "!"]* "!"
find ([any except "!"]* "!")+
The first alternative matches input up to and including the first exclamation point, and the second matches input up to and including the last exclamation point.
Programmers who are used to line-oriented pattern matching in other languages must remember that any
will match newline characters as well. To confine matching to a single line, use any-text
instead of any
.
Prerequisite Concepts Pattern matching |
---- |