contentsconceptssyntaxlibrariessampleserrorsindex
Full text search
Looking ahead
Prerequisite Concepts     Related Syntax  

The keyword lookahead in a compound pattern precedes data to be recognized but not consumed by the pattern-matching process. For example, the pattern

  digit+ lookahead blank* "+"

matches a string of digits that is followed by optional spaces and tabs and then a plus sign. However, only the digits are selected. The white space characters, if any, and plus sign remain in the document and can be selected by other patterns.

lookahead can also be used to verify that selected data is not followed by input matching a given pattern. For example,

  digit+ lookahead ! letter

selects a string of digits as long as the digits are not immediately followed by letters. (The "!" operator is the symbol for the keyword not.) Note that only one letter needs to be found for the lookahead test to fail, so there is no need to put a "+" following letter in the example above.

Positive and negative lookahead can be combined in one pattern. For example, in data files for the TeX formatter, instructions (called "control sequences") consist of a backslash followed by letters.

The control sequence to end a paragraph is \par. However, standard control sequences such as \parskip or \parindent as well as programmer-defined macro names can begin with the same string.

Suppose paragraphs consist only of letters, punctuation, and space characters. In other words, suppose that no control sequences occur within a paragraph. The following pattern matches paragraph text terminated by the \par control sequence; it fails to match input terminated by another control sequence beginning with the characters \par:

  [letter | ".,!?" | blank]+ lookahead "\par" ! letter

Recall that any pattern can be enclosed in parentheses and used as a subpattern. lookahead patterns can be used in this way. For example,

  ((lookahead ! "xyz") any)+

matches any input string that does not contain the sequence "xyz" as a substring. Note that both sets of parentheses are necessary. Without the inner set, any becomes part of the lookahead pattern. Without the outer set, the lookahead is not repeated as successive characters are selected.

The above example works in the following manner, beginning at the current point in the file, the data content, or the data being scanned:

  1. If the next three characters are "xyz", then the lookahead pattern fails, and the pattern terminates.
  2. If the next three characters do not match "xyz", or if there are less than three characters left, then the lookahead pattern succeeds. The current position is not advanced.
  3. If there are no more characters, then the pattern any will fail, and the whole pattern terminates.
  4. Otherwise, the pattern any matches the next character.
  5. The current point in the input is advanced a single character.
  6. The "+" indicator causes the above steps to be repeated.

If the pattern any has matched at least one character, then the pattern succeeds. Otherwise, it fails.

Prerequisite Concepts
     Pattern matching
 
  Related Syntax
   lookahead, lookahead not
 
----

Top [CONTENTS] [CONCEPTS] [SYNTAX] [LIBRARIES] [SAMPLES] [ERRORS] [INDEX]

Generated: April 21, 1999 at 2:00:49 pm
If you have any comments about this section of the documentation, send email to [email protected]

Copyright © OmniMark Technologies Corporation, 1988-1999.