Input scopes

In OmniMark, consumption and processing of input data is separated from selecting the source of the input, meaning that the input variable does not have to be lexically in scope for you to consume it. Instead, any string source or markup source can be placed in an input execution scope. While an input source is in the current input scope, all pattern-matching and parsing statements and rules will consume it regardless of the lexical scope they occur in.

A new input scope is created by every submit, every variant of scan, and every matches test. They establish the current input for the execution scopes contained within them. They also initiate scanning of that source. You can change the current input scope without initiating scanning by using using input as.

Once an input scope is established, it is in effect for the execution scope of the submit, scan, or using input as that established it. Within that scope, you can initiate a new scan of the current source using #current-input. This allows you to perform efficient one-pass scanning of nested structures by initiating a new scan for each level of nesting, without the need to capture the whole structure and re-scan it.

The following code demonstrates this with the function "sum-of-csv", which calculates the sum of a series of comma-separated values found in the current input. This function could be called anywhere there is a current input scope, and it will consume a series of comma-separated numeric values from the current input scope and return the sum. It will exit as soon as it encounters data that does not fit the pattern it is looking for, leaving the current input scope intact, but with the comma-separated-value data consumed.

  define integer function sum-of-csv
      local integer sum initial {0}
      repeat scan #current-input
          match white-space* 
                digit+ => number
              set sum to sum + number
      return sum
      repeat scan "Results: (12,34,65, 92 , 75 )"
          match "Results:" white-space* "("
              output "Total: " || "d" % sum-of-csv
          match ")"

Note the difference between this code and the more common programming practice represented by the following program:

  define integer function sum 
      read-only integer numbers
      local integer total initial {0}
      repeat over numbers
              set total to total + numbers
      return total
      local integer numbers variable
      repeat scan "Results: (12,34,65, 92 , 75 )"
      match "Results:" white-space* "(" [digit or space or ","]* => csv ")"
          repeat scan csv
          match digit+ => num
              set new numbers to num
          match any
          output "Total: " || "d" % sum numbers

The differences between these two pieces of code are twofold. First, in the second, more conventional, code the outer level of code is responsible for identifying the whole nested structure. This has a kind of symmetry about it, but it is misleading symmetry. The task of recognizing the beginning of a nested structure takes place outside the nested structure. (You find the door marked "IN" when you are outside; you find the door marked "OUT" when you are inside.) The task of recognizing the end of a nested structure should take place inside the nested structure. In our first example, the function that handles the comma-separated values is responsible for figuring out when the comma-separated values end. It does this very easily by exiting the repeat scan as soon as it sees a character that does not fit the pattern it is looking for.

The second difference between the two programs is that the second program has to scan the csv data twice—once when it is trying to find it in the data stream, and again when it is analyzed in the second repeat scan. The first program processes the csv data and finds the end of the structure in one pass.

Prerequisite Concepts
Related Topics