Rules

OmniMark is a rule-based programming language. This means that an OmniMark program consists of a set of rules. Each rule is fired when certain conditions are satisfied. The actions associated with the rule are then performed.

You can also think of OmniMark as an event-based language. If you have ever programmed for a graphical user interface such as Windows, the Mac, or Motif, you are used to event-based programming. In these environments, the operating system captures user actions such as keystrokes or mouse clicks, or hardware actions such as data arriving on a serial port, and sends a message to the current program indicating what event has occurred. Programs consist of a collection of responses to events. The order in which program code is invoked depends on the order in which events occur.

OmniMark programs are written the same way—as a collection of responses to events. The difference is that the events an OmniMark program responds to are not user events or hardware events, but data events. Data events occur in streams of data. As a streaming language, the management of streams is built into the heart of OmniMark. OmniMark shields you from the details of stream handling just as good GUI programming languages shield you from the details of user input handling and window management.

What is a data event? Quite simply, a data event is something significant occurring in a stream of data. In a typical GUI environment, it is the operating system and its associated hardware that decides what is an event. There is a defined set of events, and programs simply have to respond to those events that interest them. Who decides what is an event in a stream of data? You do.

This is where the rule-based aspect of OmniMark comes into play. An OmniMark program consists of rules that define data events, and actions that take place when data events occur.

A rule is made up of two parts—a rule header and a rule body.

In the rule header you define the event which, when encountered in an input stream, will cause a certain action or set of actions to be executed. Rule headers are made up of two things: an event definition (usually a keyword followed by a name or pattern), and an optional condition which must be satisfied before the actions in the rule body are executed.

The rule body contains any number of local declarations and actions that are to be executed when the event in the rule header is encountered.

If the event defined in a rule header is encountered in an input source, and if the condition on that event is satisfied or evaluates to true, then the rule fires and its actions are executed.

OmniMark provides the following classes of rules:

  • process rules, used to initiate and control processing
  • find rules, used to scan data
  • markup rules, used to process data parsed by OmniMark's integrated XML and SGML parsers or an external parser.

Rules in action

Suppose you wanted to count the words in the text Mary had a little lamb. You would write an OmniMark rule that defined the occurrence of a word as an event:

  find letter+
          

This is an OmniMark find rule. find rules attempt to match patterns that occur in a data stream, and if they match something completely, they detect an event. This rule matches letters. The + sign after the keyword letter stands for "one or more", so this rule will go on matching letters until it comes to something that is not a letter, such as punctuation or a space. Having run out of letters, it will see if it needs to match anything else. Since it doesn't, the pattern is complete and the rule is fired. Any actions following the rule are then executed. This rule will fire once for every word in the data, so all that remains to do is increment an integer each time the rule is fired. A complete program to count the words in Mary had a little lamb looks like this:

  global integer wordcount initial { 0 }
  
  process
     submit "Mary had a little lamb"
     output "d" % wordcount || "%n"
  
  
  find letter+
     increment wordcount
          

Nested execution model

Many programming languages encourage nested code, with functions calling functions calling functions. This helps modularize functionality in a regular programming language. It also makes the execution path rigid and makes it difficult to react to complex sequences of events. OmniMark code is very flat. While you can define and use functions, they are used only within OmniMark's principal execution unit, the rule, and cannot contain rules themselves. All OmniMark rules exist at the base level of the program. In OmniMark you tend to find not nested code, but nested execution.

In processing complex markup, with many nested elements, rules are invoked at each level as appropriate. If you are seven layers of markup deep, seven rules are in mid-execution. This means that you do not have to maintain complex state tables or parse trees. The current execution state of the OmniMark program itself maintains the current parse state for you and makes it easily addressable.

Since you cannot tell in advance the order in which the execution of rules may be nested, nesting the rules themselves would make no sense. Hence the simplicity and flatness of a typical OmniMark program.

Nevertheless, you can and should encapsulate common functionality in your OmniMark programs. OmniMark provides several facilities to do this including functions, groups, macros, include files and modules .

Prerequisite Concepts