Pattern matching functions

A pattern matching function is a switch-returning function that is used in a pattern and participates in the pattern matching process by scanning #current-input. The function's return value is used in the calling pattern to determine if the pattern matched by the function succeeded or failed.

Here is a very simple pattern matching function that matches text up to and including a specified string:

  define switch function 
     upto-and-including (value string p)
  as
     return #current-input matches any ** p
  
  
  process
     submit "Mary had a little lamb."
      
  
  find "Mary" upto-and-including ("little") => t
     output t
      
  
  find any

Here the function upto-and-including uses the matches operator to determine if the input data, represented by #current-input, contains the terminating string value. If it does, matches consumes that portion of the data and the function returns true, allowing the pattern that called the function to continue.

If the input data does not contain the terminal string, matches returns false, the function returns false, and the pattern that called the function fails.

As the code above shows, data matched by a pattern matching function can be captured in a pattern variable in the usual way. One of the limits of conventional pattern variables is that they cannot be used to build a shelf of values from a repeated pattern. Pattern matching functions offer a way around this limitation:

  global string patterns variable
  
  define switch function 
     digit-catcher (modifiable string digits)
  as
     do scan #current-input
     match digit+ => d
        set new digits to d
        return true
  
     else
        return false
     done
  
  
  process
      submit "(1)(2)(3)(4)"
      
  
  find ("(" digit-catcher (patterns) ")")+
     repeat over patterns as p
        output p || "%n"
     again

Pattern matching functions are particularly useful in nested pattern matching. The following code uses a pattern matching function to handle nested parentheses:

  define switch function 
     between-parentheses () 
  as
     repeat scan #current-input
     match [any \ "()"]+
        ; Keep going.
  
     match "(" between-parentheses () ")"
        ; We've recursed in.
  
     match value-end
         return false
     again
  
     return true
  
      
  process
     submit "(1((2)(3))478(954)"
      
  
  find "(" between-parentheses () => t ")"
     output t || "%n"
  
          
  find any

The function between-parens matches data between parentheses. If it encounters an opening parenthesis character, it calls itself recursively so that any level of parenthetical matter will be matched. If it encounters a closing parenthesis that is not balanced by a preceding opening parenthesis, the character will not match, the repeat scan will exit, and the function will return true.

Note that we do not actively match the closing parenthesis. Rather, the closing parenthesis is the only thing we do not match. This is a common and useful technique in many kinds of balancing operations. Find everything but the closing delimiter, and allow the repeat scan to exit. This allows the closing delimiter to be matched in the outer pattern, which is good for two reasons. First, it makes the pattern easier to read. Second, it allows you to capture the content of the structure without its delimiters (as we do here).

If the function matches the end of the input without seeing the closing parenthesis, it returns false. If this occurs in an iterative call, value-end will then be matched by each instance of the function as it unwinds.

Interestingly enough, this function can be written in a slightly more compact fashion:

  define switch function 
     between-parentheses () 
  as
     repeat scan #current-input
     match [any \ "()"]+
        ; Keep going.
  
     match "(" between-parentheses () ")"
        ; We've recursed in.
     again
  
     return true
  
  
  process
     submit "(1((2)(3))478(954)"
      
  
  find "(" between-parentheses () => t ")"
     output t || "%n"
  
          
  find any

This form never returns false. It does, however, work almost identically to the original function. Unless a balancing closing parenthesis is encountered, the function will read to the end of the data, just like the previous version. It then returns true, rather than false, just as if it had ended with the closing delimiter. But the pattern that called the function will now fail because it will not be able to match the closing parenthesis.

You can also use pattern matching functions to process the matched data, though it is important to remember that the code in a pattern matching function is called and executed before the pattern as a whole is complete. This means the function could execute even though the pattern as a whole fails. Thus the function could be called and executed again in a subsequent attempt to match the same data. As a consequence, pattern matching functions with side-effects can lead to unexpected program behaviour, and should be avoided. For example, the program

  define switch function
     greeting ()
  as
     put #main-output "*"
  
     return #current-input matches "Hello, World!"
  
  
  process
     submit "Hello, World!%n"
  
  
  find greeting ()
     output "Salut, Monde!"

outputs

  *Salut, Monde!*
  *

In this example, the first asterisk is output when greeting succeeds in matching Hello, World!. The second asterisk is emitted when the pattern matching function is called in attempting to match the newline (%n) that follows Hello, World!. In this case, the function fails to match, but the asterisk has been output nonetheless. Finally, the third asterisk is output when similarly attempting to match the end of input.

Prerequisite Concepts

Related Topics