The keyword lookahead
in a compound pattern precedes data to be recognized but not consumed by the pattern-matching process. For example, the pattern
digit+ lookahead blank* "+"
matches a string of digits that is followed by optional spaces and tabs and then a plus sign. However, only the digits are selected. The white space characters, if any, and plus sign remain in the source and can be selected by other patterns.
lookahead
can also be used to verify that selected data is not followed by input matching a given pattern. For example,
digit+ lookahead not letter
selects a string of digits as long as the digits are not immediately followed by letters. Note that only one letter needs to be found for the lookahead
test to fail, so there is no need to put a "+" following letter
in the example above.
Positive and negative lookahead
can be combined in one pattern. For example, in data files for the TeX formatter, instructions (called "control sequences") consist of a backslash followed by letters.
The control sequence to end a paragraph is \par
. However, standard control sequences such as \parskip
or \parindent
as well as programmer-defined macro names can begin with the same string.
Suppose paragraphs consist only of letters, punctuation, and space characters. In other words, suppose that no control sequences occur within a paragraph. The following pattern matches paragraph text terminated by the \par
control sequence; it fails to match input terminated by another control sequence beginning with the characters \par
:
[letter | ".,!?" | blank]+ lookahead "\par" not letter
Recall that any pattern can be enclosed in parentheses and used as a subpattern. lookahead
patterns can be used in this way. For example,
((lookahead not "xyz") any)+
matches any input string that does not contain the sequence "xyz" as a substring. Note that both sets of parentheses are necessary. Without the inner set, any
becomes part of the lookahead
pattern. Without the outer set, the lookahead
is not repeated as successive characters are selected.
The above example works in the following manner, beginning at the current point in the file, the data content, or the data being scanned:
lookahead
pattern fails, and the pattern terminates.
lookahead
pattern succeeds. The current position is not advanced.
any
will fail, and the whole pattern terminates.
any
matches the next character.
If the pattern any
has matched at least one character, then the pattern succeeds. Otherwise, it fails.