Beginner's guide to OmniMark

Beginner's guide to OmniMark

------

About OmniMark

OmniMark is a streaming programming language. As a starting point, you can think of OmniMark as either a rule-based language or an event-based language. If you have ever programmed for a graphical user interface such as Windows, the Mac, or Motif, you are used to event-based programming. In these environments, the operating system captures user actions such as keystrokes or mouse clicks, or hardware actions such as data arriving on a serial port, and sends a message to the current program indicating what event has occurred. Programs consist of a collection of responses to events. The order in which program code is invoked depends on the order in which events occur.

OmniMark programs are written the same way, as a collection of responses to events. The difference is that the events an OmniMark program responds to are not user events or hardware events, but data events. Data events occur in streams of data. As a streaming language, the management of streams is built into the heart of OmniMark. OmniMark shields you from the details of stream handling just as good GUI programming languages shield you from the details of user input handling and window management.

What is a data event? Quite simply, a data event is something significant occurring in a stream of data. In a typical GUI environment, it is the operating system and its associated hardware that decides what is an event. There is a defined set of events, and programs simply have to respond to those events that interest them. Who decides what is an event in a stream of data? You do.

This is where the rule-based aspect of OmniMark comes into play. An OmniMark program consists of rules that define data events, and actions that take place when data events occur. Suppose you wanted to count the words in the text "Mary had a little lamb". You would write an OmniMark rule that defined the occurrence of a word as an event:

  find letter+

This is an OmniMark find rule. Find rules attempt to match patterns that occur in a data stream, and if they match something completely, they detect an event. This rule matches letters. The "+" sign after the keyword "letter" stands for "one or more", so this rule will go on matching letters until it comes to something that is not a letter, such as punctuation or a space. Having run out of letters, it will see if it needs to match anything else. Since it doesn't, the pattern is complete and the rule is fired. Any actions following the rule are then executed. This rule will fire once for every word in the data, so all that remains to do is increment a counter each time the rule is fired.

If you are used to other languages such as C or Visual Basic, you are probably thinking that there is something odd about the find rule above. Sure, it finds words, but what does it find them in? Where is the reference to the file or variable that contains the data?

Because it deals primarily with events happening in data, OmniMark maintains a current input. Rules automatically apply to the current input, so you don't have to specify what each rule applies to.

Similarly, OmniMark has a built-in output. All output goes to current output. If you need to change were output goes, you change the destination of current output.

Of course, you are not restricted to using only a single input or output. You can define and use a variety of inputs and outputs, as well as variables. But in OmniMark you generally do not have to concern yourself with opening files, reading the content into variables, and stepping through the content as you would in other languages. In OmniMark, you just name the desired input source and let the data flow; thus, a complete program to count the words in "Mary had a little lamb" looks like this:

  global counter wordcount initial {0}

  process
     submit "Mary had a little lamb"
     output "%d(wordcount)%n"

  find letter+
     increment wordcount

(In the output statement, "%d" is a format item used to convert the value of the counter "wordcount" to a string and "%n" is a newline.)

To try this program, copy it to a file named "test.xom" and type the following on the command line:

  omnimark -s test.xom

The example above introduces a new kind of rule, the process rule. The process rule, as you would expect, is fired when processing begins. Our program consists of one global variable declaration and two rules. Note that, in this program, it doesn't matter in what order the rules appear since each fires only when a specific event occurs. Thus we could just as easily write the program:

  global counter wordcount initial {0}

  find letter+
     increment wordcount
     output "%d(wordcount)%n"

  process
     submit "Mary had a little lamb"

This program runs just the same as the first. This is not to say that the order of rules never matters in an OmniMark program. If one event could cause more than one find rule to fire, the rule that occurs first will fire, and the one that occurs later will not. This allows you to put more specific rules before more general rules and have the general rules fire only if the specific rule does not. The following two programs produce different output:

  global counter wordcount initial {0}

  process
     submit "Mary had a little lamb"
     output "%d(wordcount)%n"

  find "had"
     output "*"

  find letter+
    increment wordcount

  find any

The program above prints "*4". The program below changes the order of the find rules and produces a different output.

  global counter wordcount initial {0}

  process
     submit "Mary had a little lamb"
     output "%d(wordcount)%n"

  find letter+
     increment wordcount

  find "had"
     output "*"

  find any

This program prints "5".

Why did we add "find any" as a new rule in both these programs? Actually, it fixes an error in all the earlier versions of our word counting program. The rule "find letter+" matches words. But what about the spaces between the words? What was happening to them? If you actually ran the first program, you might have noticed that it printed its result indented by four spaces. Those are the unmatched spaces from our input. Any input that is not matched by a find rule goes right through to output; "find any" at the end of a set of find rules soaks up any unmatched input. Of course, if you use "find any" it must always be the last find rule.

We said that, in OmniMark, you define data events. This is not always true. Sometimes the data itself contains the definition of the event. Documents written in formal markup languages based on SGML and XML contain tags which break the document up into a set of elements. In such a document, the occurrence of such an element constitutes a data event. Because OmniMark has built-in parsers for XML and SGML, you don't need to worry about how elements are recognized, you just need to write rules to process them when they occur. These are called markup rules.

Pattern and markup processors

The OmniMark language has extensive information processing capability built in. This functionality is packaged into two processors, the pattern processor and the markup processor. In other languages you would have to code this functionality yourself.

The pattern processor provides pattern-matching functionality, and operates on a stream of bytes (which may be text or binary data). The pattern processor works with find rules, detecting an event whenever a pattern defined by a find rule occurs in the input stream. It then fires that rule.

The markup processor provides markup recognition for markup languages created using XML or SGML. The markup processor works with markup rules, detecting an event each time an element or other structure defined by markup occurs in the input stream. It then fires the markup rule associated with the event detected.

Note that the markup processor detects and reports the elements defined by markup, not the text of the markup itself. Thus, in the string "Mary had a little <animal>lamb</animal>" the markup processor will fire the markup rule element animal and will report that the data content of the element is "lamb". It will not report that it found the markup strings "<animal>...</animal>". In XML, you can safely assume that it did in fact encounter that markup, but in SGML, which allows for shortened forms of tags and for complete omission of tags in some cases, several different combinations of markup can represent the same element and cause the same rule to fire.

If you wanted to process the text of the markup, as opposed to the structure defined by the markup, you would use the pattern processor. In fact, you can write your own markup processor using the pattern processor. This is useful for converting markup that is not compatible with XML or SGML into XML or SGML form.

How do you use the pattern and markup processors? Simply direct input to the appropriate processor. You direct input to the pattern processor with submit. For example, the following code fragment sends a file called names.txt to the pattern processor.

  submit file "names.txt"

The OmniMark actions do sgml-parse and do xml-parse direct input to the markup processor. For example, the following code fragment sends a file called myfile.xml to the XML parser of the markup processor.

  do xml-parse document
     scan file "myfile.xml"
     output "%c"
  done

You can also direct input to the pattern or markup processors using OmniMark's aided translation types.

Once you direct input into one of the processors, OmniMark processes the entire input, firing rules as they occur. If, in responding to an event, you perform an action that submits new input to one of the processors, the current input is suspended, and the new input is processed. When the new input has finished processing, OmniMark resumes processing the original input.

This feature has many uses. For instance, you could read a list of names of files containing XML markup, and open and process each file in turn:

  process-start
     submit file "names.txt"

  find [any except white-space]+=> filename
     do xml-parse document
        scan file "filename"
        output "%c"
     done

  find white-space+
  ;absorb leftover white space in names.txt

  element ...

(Note that the find rule for file names is pretty rudimentary. It's fine if you know the structure of the data you're reading, but don't take this as a general method for identifying file names!)

Event handling

In OmniMark, data events occur in the processing of the current input. What kind of event occurs depends on which processor, the pattern processor or the markup processor, is presently acting on the current input. Events occurring in the pattern processor cause find rules to fire. Events occurring in the markup processor cause markup rules to fire.

To handle an event, you attach actions to the appropriate find or markup rule. Processing continues until the input is exhausted. During the processing of an event, however, you may do something that starts processing a new piece of input. In this case, processing of the original input is suspended while the new input is processed. This is where things begin to get interesting, because, by default, this new input will be processed by exactly the same rules as were processing the original input. Consider the following program:

  process
     submit "Mary had a little lamb"

  find "had"
     submit "Joe bought a big lamb"

  find "lamb"
     output "sheep"

This program will output "Mary Joe bought a big sheep a little sheep". The submit in the rule find "had" suspended processing of the originally submitted data. The find rule for "lamb" fired once for the "lamb" in the second input. When processing of the second submit finished, the first resumed and the "lamb" rule fired once again for the "lamb" in the first input. Can you see why writing code like the following is not a good idea? (The second submit now reads "Joe had a big lamb".) If you feel compelled to try it, save your other work first:

  process
     submit "Mary had a little lamb"

  find "had"
     submit "Joe had a big lamb"

  find "lamb"
     output "sheep"

When pattern processing, your input is processed completely, unless you decide to pause and begin processing something else in the middle. When processing markup, however, you will have to deal with every level of nesting in the markup. Every element has content, which may include other elements. When an element occurs you have to deal with three things: the start of the element, its content, and its end. To give you the opportunity to decide how and when to deal with the element content, markup processing stalls at each element, and you need to explictly get it going again.

How do you continue parsing? The OmniMark keyword %c causes parsing to continue. You may think of it as equivalent to "continue xml-parse" or "continue sgml-parse". (Since you will always want to continue parsing in the middle of the output of the current element, the parse continuation operator takes a form ("%c") that can easily be dropped into a text string.). A do xml-parse or do sgml-parse simply sets up the parser in the appropriate initial state to process the input it is receiving. To actually start parsing you must output the parse continuation operator ("%c"):

  do xml-parse document
     scan file "fred.xml"
     output "Beginning %c End")
  done

Every markup rule must output the parsing continuation operator ("%c") or its alternative, suppress, which continues parsing, but suppresses output of the parsed content. Without them, your program would stall permanently.

For obvious reasons, you cannot use %c or suppress more than once in a markup rule. Don't fall into the trap of thinking of %c as standing for the content of an element. It does not. It is an instruction to continuing parsing that content. Any output that appears in the place where %c occurs in an output string is created by the rules which fire as a result of parsing element content.

Can you mix pattern processing and markup processing in the same program? Certainly you can. Consider the following code:

  element fred
     submit "Element fred contains: %c That's all."

The result of parsing the content of element fred will be inserted into the string at the position %c occurs. The entire string will then be submitted to the find rules.

Input and output

Stream processing is at the heart of OmniMark. As a streaming language, OmniMark handles input and output at the core of the language. OmniMark maintains a current input and all recognition of data events is performed on the current input. Similarly, OmniMark maintains a current output at all times, and all output is directed to the current output. This greatly simplifies processing, since you never need to specify what data an action operates on or what destination output goes to. You simply set the current input and output to the appropriate streams and go.

OmniMark defines a number of default streams that correspond to the standard input, standard output, and standard error streams of the underlying operating system.

(Windows programmers may not be familiar with these concepts, which apply in command-line environments. Standard input is where input comes from by default: in most cases, the keyboard. Standard output is where output of a program goes by default, usually to the screen. Standard error is where error messages go by default, usually to the screen. Some operating systems provide sophisticated facilities for manipulating standard input, output, and error. A Windows DOS box provides standard input, output, and error. Under Windows, OmniMark runs in a DOS box.)

In OmniMark, #process-input is a stream bound to standard input, #process-output is a stream bound to standard output, and #error is a stream bound to standard error.

#main-input is a stream that is bound to standard input unless you specify an input file or files on the command line, in which case, #main-input is bound to those files. Similarly, #main-output is bound to standard output unless you specify an output file on the command line, in which case, #main-output is bound to that file.

By default, in aided translation type programs OmniMark's current input is #main-input and its current output is #main-output. Current input and current output may change many times in the course of a program, but #main-input and #main-output never change. Despite its name, however, output only goes to #main-output when it is attached to the current output. As you might expect, OmniMark's current output can be referred to as #current-output. In normal programs #current-input is unattached by default.

Of course, you don't have to deal with #main-input and #main-output at all if you don't want to. You can always explicitly assign current input and current output to files from within your code. Since OmniMark has no direct user interface functions, however, the command line is the principal way to pass input and output file names into a batch style OmniMark program.

Server style OmniMark programs communicate with a variety of clients over TCP/IP networks and can receive file names and other instructions from the client. Since OmniMark servers run in the background, there is often no point in dealing with any local input and output streams. If you are writing a server you may wish to disable all the default input and output streams with declare no-default-io.

To explicitly assign current input to a file use submit, do sgml-parse, or do xml-parse with the file modifier:

  submit file "mary.txt"

  do xml-parse document
     scan file "mary.xml"
     output "%c"
  done

You can even assign multiple files to be processed sequentially:

  submit file "mary.txt" || file "lamb.txt"

  do sgml-parse document
     scan file "sgmldec.sgm" || file "rhymes.dtd" || file "mary.xml"
     output "%c"
  done

All other OmniMark actions that initiate text or markup processing, such as do scan, also accept files in just the same way. If you can process a string, you can process a file by replacing the stream variable or literal string with file and the name of the file. In all cases, doing so sets the current input to the named file.

Output

All output from an OmniMark program goes to OmniMark's built-in current output. You do not usually need to explicitly state where you want output to go, you just output it and it goes to current output. When you do state a destination for output, you are, in effect, resetting current output to the named destination and outputting to current output.

You can change the destination of current output within a rule:

  element lamb
     local stream mary
     open mary as file "mary.txt"
     using output as mary
     do
        output "ba ba ba %c"
     done

This rule temporarily changes the current output to the file mary.txt. Any output that occurs in the do...done block following using output as goes to the new destination. Once the block is finished, current output reverts to its original destination. Note, however, that the output statement contains the parse continuation operator (%c). Is the new output destination in effect for all the processing that occurs as part of the parsing of the lamb element? Yes it is. Output that is generated by any rules that fire as a result of parsing the lamb element will go to the file "mary.txt".

To understand how this works, consider an XML file, "mary.xml", that contains a valid DTD and the following markup:

  <line>
  <person>Mary</person> had a little <person>lamb</person>
  </line>

And consider the following OmniMark program:

  global stream words
  global stream people

  process
     open words as file "words.txt"
     open people as file "people.txt"
     using output as words
     do xml-parse document
        scan file "mary.xml"
        output"%c"
     done

  element line
     output "%c"

  element person
     using output as people
     do
        output "%c "
     done

Running this program will leave you with two files: words.txt will contain " had a little ", and people.txt will contain "Mary lamb ". Look closely at this code to make sure you understand which output destination is in effect in the "line" element rule. Make sure you understand why the output ended up in the files it did. If you are comfortable with this, you know most of what you need to know about how OmniMark handles output.

The last rule of the program above can be shortened slightly by using put as a shorthand for the using output as block:

  element person
     put people "%c "

OmniMark's current output is a powerful mechanism for simplifying code by eliminating the need to always state where output is going. Once the destination of the current output is set, all output goes to that destination unless you explicitly send it elsewhere. Current output has the additional feature of being able to have more than one destination at a time:

  global stream my-file
  global stream my-buffer

  process
     open my-file as file "myfile.txt"
     output-to my-file
     submit "Mary had a little lamb"

  find "had"
     open my-buffer as buffer
     output using my-buffer and #current-output
     do
        output "I've been had!"
     done

This code will place "I've been had!" in both the file myfile.txt and in the variable my-buffer. #current-output stands for all the current destinations of current output, so you can use it to add a new destination to all those currently active (even if you don't know what they are).

Variables

In OmniMark there are three kinds of variables, each of which can hold a different type of data. Stream variables are used to store string values, counter variables to store numeric values, and switch variables to store Boolean (true/false) values.

Variables can be either global or local. The difference between these is that global variables exist (and can be used) everywhere within a program, and local variables only exist (and can be used) within the rule or function where they are declared.

Since variables cannot be used until they are declared, global variable declarations usually appear at the top of an OmniMark program, and local variables appear at the beginning of the rule or function in which they are to be used. The "scope" of a variable (global or local) must be indicated in the variable declaration.

A variable declaration that creates a global counter variable named "count1" looks like this:

  global counter count1

Once declared, the variable "count 1" can be used to store any positive or negative integer value.

To create a local stream variable named "quotation", you would use the variable declaration:

  local stream quotation

To store a string in a stream variable, you can use the set keyword. For example:

  set quotation to "Is this a dagger I see before me?"

Counter variable values can be set and changed the same way as stream variables using the set action, but counter variables can also be manipulated using the increment and decrement actions. For example, to increase the value of the count1 variable by 1, you need only say:

  increment count1

It is possible to increment or decrement the value of a counter variable by the value of another counter variable. For example, you could decrement the value of "count1" by the value of "count2" with the following code:

  decrement count1 by count2

The following is a program that makes use of a global switch variable to decide which output action should be executed:

  global switch question

  process
     set question to true

  process
     do when question  ;checks if question is true
        output "to be"
     else
        output "not to be"
     done

Note that the output of this program will always be "to be".

It is possible to declare a variable with an initial value:

  global counter count2 initial {3}
  global stream quotation2 initial {"A horse!"}
  global switch status2 initial {true}

You can set a variable to the value of another variable. For example, the process rule in the following program will set the value of the global counter variable "var1" to the value of the local counter variable "var2":

  global counter var1

  process
     local counter var 2
     set var2 to 8
     set var1 to var2

  process
     output "%d(var2)"

When counter and switch variables are created in OmniMark, they have default values of "1" and "false", respectively. Stream variables are somewhat different, however, because the default state of a stream is "unattached". Unless a value has been stored in a stream variable, there is no value in that variable for you to access. In database terms, the default value of a stream variable is "null".

For all intents and purposes, there is no practical limit on the number of variables that can be declared in an OmniMark program, nor is there a practical limit on the size of the numbers or strings that can be stored within them.

I/O and variables

Like most languages, OmniMark has actions that assign values to variables (set) and actions that read data from and write data to files (open, put, close). Unlike most languages, OmniMark lets you perform file operations with the variable assignment actions, and change the values of stream variables with the file actions. For example, you can place a simple value in a file with the set action:

  set file "mary.txt" to "Mary had a little lamb"

And you can use open, put, and close to set the value of a variable:

  local stream Mary
  open Mary as buffer
  put Mary "Mary had a little lamb"
  close Mary

How is this magic performed? Simply, in fact, because the variable assignment syntax (set) is simply a shorthand version of the file operation syntax. That is, set Mary to "Mary had a little lamb" is equivalent to:

  open Mary as buffer
  put Mary "Mary had a little lamb"
  close Mary

The virtue of using the longer syntax is that you can put off closing the stream until later and write to it many times. This is much easier and more efficient than building up a string by a series of concatenations. So you can replace code like this:

  set Mary to "Mary had a little lamb"
  set Mary to Mary || "Its fleece was white as snow"
  set Mary to Mary || "And every where that Mary went"
  set Mary to Mary || "The lamb was sure to go"

with code like this:

  open Mary as buffer
  put Mary "Mary had a little lamb"
  put Mary "Its fleece was white as snow"
  put Mary "And every where that Mary went"
  put Mary "The lamb was sure to go"
  close Mary

You can also make your variable the temporary current output so that everything sent to output goes into that variable:

  open Mary as buffer
  using output as Mary
  do
     output "Mary had a little lamb"
     output "Its fleece was white as snow"
     output "And everywhere that Mary went"
     output "The lamb was sure to go"
  done

This is an enormously powerful feature of OmniMark. It enables you to choose the type of data assignment mechanism appropriate to the scale of operation you want to perform. You can use set for any kind of small scale assignment, whether to a file or a variable, without any of the bother of opening files or buffers. For large-scale operations, you can use file type operations with any file or variable and perform multiple updates without the need to specify the destination, or even worry about the kind of destination involved. Choosing the method appropriate to the scale of operation you are performing will greatly simplify your code.

A stream must be closed before it can be read or output:

  open Mary as buffer
  put Mary "Mary had a little lamb"
  close Mary
  output Mary

You can use the action reopen to reopen a closed stream with its original content. However, if you use open to open the stream again, the existing content is lost:

  open Mary1 as buffer
  open Mary2 as buffer

  put Mary1 "Mary had a little lamb"
  put Mary2 "Mary had a little lamb"

  close Mary1
  close Mary2

  reopen Mary1
  open Mary2

  put Mary1 "Its fleece was white as snow"
  put Mary2 "Its fleece was white as snow"

This code will leave the stream Mary1 containing both lines, but Mary2 will contain only "Its fleece was white as snow".

Can you mix the two methods? Yes, but remember that "set" is a shorthand for the sequence "open...put...close" which means that a stream is always closed after a set. This means that you cannot write to it without reopening it. Also remember that a stream is always opened, not reopened, in a set, so the previous content is always lost. In practice, it is not a good idea to mix the two methods. They are really appropriate for different scales of operation. Pick the one that is appropriate to what you have to do and stick to it.

Arrays

Most programming languages allow programmers to store and manipulate values in arrays, associative arrays, queues, and stacks. Instead, OmniMark provides a data container called a "shelf" which can be used to accomplish all of the tasks normally carried out by these various structures in other programming languages. Like arrays, shelves can be indexed by numeric values that reflect the position of the elements they contain or, like associative arrays, these elements can be given names (keys) and then indexed by those keys.

A shelf is a data structure that is used to store one or more values of a certain type. Stream shelves can be used to store one or more string values, counter shelves to store one or more numeric values, and switch shelves to store one or more Boolean values.

A global stream shelf declaration that creates a shelf of variable size named quotations would look like this:

  global stream quotations variable

A local counter shelf declaration that creates a counter shelf named "count1" that can contain three (and only three) numeric values would look like this:

  local counter count1 size 3

If you want to create a shelf with initial values that are different from these defaults, you can do this by adding an initial keyword to the declaration, followed by the values you want on the shelf being enclosed in curly braces. For example:

  global counter count2 size 4 initial {1, 2, 3, 4}

This declaration creates a global counter shelf named "count2" that can hold four values with initial values of "1", "2", "3", and "4". You could also create a variable-sized shelf that contains a number of initial values, as follows:

  global counter count3 variable initial {1, 2, 3, 4}

The only difference between these two shelves (other than their names) is that while "count2" is a fixed-size shelf holding four values, "count3" begins with four values and can be expanded or contracted to hold as many as required. If you're not sure how many values you will need to store on a shelf, it's best to declare it with a variable size.

Additionally, shelves of a particular size can be created without having to assign initial values to the shelf items. This is accomplished by using the initial-size keyword:

  global counter count4 variable initial-size 4

This shelf declaration creates a counter named "count4" that starts with space for four items and can be expanded or contracted as required.

To store the string "Now is the winter of our discontent" in the stream shelf "quotations", you would use the following action:

  set quotation to "Now is the winter of our discontent"

This begs the question "where on the shelf was this value stored?" Unless you explicitly specify which item on a shelf you want a value stored in, a value will be stored in the current item. A shelf is basically an ordered list of items ranging from 1 to n. The default behavior of a shelf is that all new items are added after n. If you use set to store a different value on the shelf without specifying a different item, it will simply replace the n value on the shelf.

To change this default behavior, you can use either of two shelf indexing methods. The first index is based upon the position number of a value on a shelf. For example, the following code sets a value in the third position of the "quotation" shelf:

  set quotation item 3 to "Words, words, words."

The second index is based upon names or "keys" that are assigned to each value on a shelf. To set the key of the current item on a shelf, you would use the following code:

  set key of quotation to "Richard iii"

To set the key of a particular item on a shelf, you would use the same code, but adding a position index:

  set key of quotation item 3 to "Hamlet"

Using the key index of a shelf is very like using the position index, except instead of using the item keyword, you use the key keyword:

  set quotation key "Hamlet" to "To be or not to be?"

It is possible to set a key on a shelf item when it is created. This is accomplished by setting the key in the same action in which the new item is created. For example, to create a new item on the "quotes" shelf that has a value of "Alas, poor Yorick." with the key "Hamlet", you would use the action:

  set new quotes key "Hamlet" to "Alas, poor Yorick."

Up to this point, every time we have created a new item on a shelf, it has been added at the lastmost position of the shelf. If you want to create a new item somewhere else on a shelf, this can be accomplished by using the before or after keywords in the same action used to create the new item. For example, if you want to create a new item that will exist immediately before the second item on a shelf, you would use the following action:

  set new quotes before item 2 to "A horse!"

This would create a new item containing the value "A horse!" between the first and second items on the "quotes" shelf. Since the item numbers are based on shelf position, this new item would become item 2, and the item that was number 2 would become number 3. If the values had assigned keys, of course, these keys would not change.

If you wanted to create a new item on a shelf just after an item that had the key "MacBeth", you would use the action:

  set new quotes after key "MacBeth" to "A horse!"

To illustrate all of this, the following program creates a global stream shelf, and sets the first item on that shelf to a value. Following that, the program gives that first item a key. Then three other items are created: one at the "default" end of the shelf, another before the second item on the shelf, and the third after a value with a set key.

  global stream quotes variable

  process
     set quotes to "To be or not to be?"
     set key of quotes item 1 to "Hamlet"
     set new quotes key "MacBeth" to "Is this a dagger?"
     set new quotes key "Richard iii" before item 2 to "Now is the winter of out discontent."
     set new quotes key "Romeo" after key "Richard iii" to "Hark, what light through yonder window breaks?"

     repeat over quotes
        output key of quotes || " - "
        output "%g(quotes)%n"
     again

This program will have the following output:

  Hamlet - To be or not to be?
  Richard III - A horse!
  Romeo - Hark, what light through yonder window breaks?
  MacBeth - Is this a dagger?

Stacks and queues

OmniMark shelves, in addition to having all of the characteristics of arrays and associative arrays, also have the properties of stacks and queues.

A stack is a type of data container which operates under the basic "FILO" (First In Last Out) principle. When you add two items to a stack, for example, you have to remove the second item before you can access the first. The default behavior of the currently selected item on a shelf makes it easy to create stack-like shelves in OmniMark. Quite simply, if you do not explicitly state that actions should be performed on a different shelf item, actions will be carried out on the default currently selected item which is the lastmost item on a shelf.

For example, the following program illustrates how OmniMark shelves act like stacks:

  process
          local counter value1 initial {2}
          output "The stack now contains %d(count1)%n"
          repeat
              output "Pushing %d(value1) on to the stack."
              set new count1 to value1

              output " The stack now contains"
              repeat over count1
                  output " %d(count1)"
              again
              output "%n"

              increment value1
              exit when value1 = 10
          again

  ; Pop all of the items off a stack
  process
      repeat
          exit when number of count1 = 0

          output "Popping %d(count1) from the stack."
          remove count1

          output " The stack now contains"
          repeat over count1
              output " %d(count1)"
          again
          output "%n"
      again

You will notice that this program simply adds and removes items from the "count1" shelf at the default item.

A queue is like a stack except it operates under the "FIFO" (First In First Out) principle. If you add two items to a queue, you have to remove the first item before you can access the second. To create a queue-like shelf in OmniMark, you need only specify that all actions are performed on the first item on the shelf (as opposed to the default lastmost item). Any new items should still be added to the shelf at the default lastmost position.

The following program illustrates an OmniMark shelf acting like a queue:

  global counter count1 variable

  process
          local counter value1 initial {2}
          output "The queue now contains %d(count1)%n"
          repeat
              output "Pushing %d(value1) on to the queue."
              set new count1 to value1

              output " The queue now contains"
              repeat over count1
                  output " %d(count1)"
              again
              output "%n"

              increment value1
              exit when value1 = 10
          again

  ; Pop all of the items off a queue
  process
      repeat
          exit when number of count1 = 0
          using count1 item 1
          output "Popping %d(count1) from the queue."
          remove count1 item 1            ; this line specifies that the first item
                                          ; on the shelf should be removed
                                          ; rather than the default lastmost
                                          ; item

          output " The queue now contains"
          repeat over count1
              output " %d(count1)"
          again
          output "%n"
      again

The only real difference between the programs is that the first program removed items from the shelf which were at the default lastmost position, while the second removed items from the shelf which existed at position "1" on the shelf.

Referents

Referents are variables that can be output before their final values have been assigned. With referents you are able to stick "placeholder" variables in your output and then later assign or change their values. These "placeholder" variables are particularly useful in creating hypertext links and cross-references, but they can be used for numerous other tasks. The following program illustrates the "placeholder" quality of referents:

  process
     local stream foo
     set foo to "Mary%n"
     set referent "bar" to "Mary%n"
     output foo
     output referent "bar"
     set foo to "lamb%n"
     set referent "bar" to "lamb%n"

The output of this program is:

  Mary
  lamb

Where the output value of stream "foo" didn't change values after being output, the output value of referent "bar" did. The final value of both of the variables did change to "lamb", but only the output of the referent reflected this change.

Notice that while the stream "foo" had to be declared before it was used, the referent "bar" did not. All you need to do to create and use a referent is give it a name and set it to a value. For example, the following code creates a referent named "ref1" and sets it to an initial value of "mary joe":

  set referent ref1 to "mary joe"

Another simple example of the use of referents is in outputting page numbers that include "of n" values, for example, "page 1 of 8". Until a document has been completely processed, there is no way to know for certain how many pages there are going to be. With referents, however, you can simply stick a placeholder where the page numbers will be in the output and, after the document has been completely processed and the number of pages determined, the final values can be plugged into the referents.

The following is a short program that will output a referent when it finds one or more numbers in the input file:

  global counter num initial {0}

  find digit+
     increment num
     output referent "%d(num)ref"

  process
     local counter num2
     submit file "test1.txt"
     repeat
        exit when num2 > num
        set referent "%d(num2)ref" to "Play %d(num2) of %d(num)%t"
        increment num2
     again

An appropriate plain-text input file for this program would be:

  1 Hamlet
  2 Richard III
  3 Macbeth
  4 Romeo and Juliet
  5 King Lear

If this input file were processed by the program shown above, the output would be:

  Play 1 of 5	Hamlet
  Play 2 of 5	Richard III
  Play 3 of 5	Macbeth
  Play 4 of 5	Romeo and Juliet
  Play 5 of 5	King Lear

So, what has happened to this output is that OmniMark matched a digit, output a referent as a placeholder, and let any following text (the title of the play) fall through to the output. With each digit encountered, the process is repeated. When the process-end rule fired, the final values of the referents were determined and resolved.

Conditional constructs

When you want a program to do one of several possible things in different situations, use a conditional construct. OmniMark provides three different forms of conditional construct, each based upon the basic "do...done" block.

It is important to note that almost anything in OmniMark can be made conditional simply by adding a when keyword followed by a test. For example, any rule can have conditions added to it:

  find "cat" when count1 = 4
     output "I found a cat%n"

This rule would only output "I found a cat" if "cat" is found in the input data and the value of count1 is equal to 4.

The simplest of the conditional constructs is the "do when...done" block. This allows you to have an OmniMark program perform various actions based on the results of one or more tests.

  do when count1 = 4
     output "Yes, the value of count1 is four%n"
  done

If you want the program to do one thing when a certain condition is true and another if it is false, you can add an else option.

  do when words matches uc
     output "%lg(words)%n"
  else
     output words || "%n"
  done

You can have a "do when" block perform a set of actions if a variable is of more than one value, by adding those conditions to the header using the or keyword. For example:

  do when count1 = 1 or count1 = 5
     output "count1 is one or five%n"
  else
     output "the value of count1 is not one or five%n"
  done

"Do when" blocks can be much more complex than this, however, since "else when" phrases are also allowed.

  do when count1 = 4
     output "Yes, the value of count1 is four%n"
  else when count1 = 5
     output "The value of count1 is five%n"
  else when count1 = 6
     output "The value of count1 is six%n"
  else
     output "The value of count1 is not 4, 5, or 6%n"
  done

Another form of conditional construct is the "do select...done" construct:

  do select count1
     case 1 to 5
        output "count1 is within the first range%n"
     case 6 to 10
        output "count1 is within the second range%n"
  done

The program won't do anything if the value of count1 is less than 1 or greater than 10, however, because there is no alternative that will be executed in these situations. This is quite easily rectified, by adding an "else" phrase to the construct:

  do select count1
     case 1 to 5
        output "count1 is within the first range%n"
     case 6 to 10
        output "count1 is within the second range%n"
     else
        output "count1 is out of range%n"
  done

Note that while "else" phrases can be used within a "do select" construct, "else when" phrases cannot.

If you want the program to do something when a variable is equal to a particular value, you have to specify that within another "case" phrase. For example:

  do select count1
     case 1 to 4
        output "count1 is in the first range%n"
     case 5
        output "count1 is equal to 5%n"
     case 6 to 10
        output "count1 is in the second range%n"
     else
        output "count1 is out of range%n"
  done

The final form of conditional constructs is a "do scan". "Do scan" constructs are used to examine a piece of input data for certain patterns. If one of the patterns is discovered in the input data, a set of specified actions is performed. For example, the following program retrieves the name of the current day and scans it. Depending on which pattern is found, the program will output one of several possible phrases.

  global stream day

  process
     set day to date "=W"
     do scan day
        match "Monday"
           output "I don't like Mondays.%n"
        match "Friday"
           output "I love Fridays!!!%n"
        else
           output "At least it's not Monday.%n"
     done

"Do scan" constructs can be used to scan input data in the form of files, streams, or the values of stream variables (as above).

Looping constructs

To have an OmniMark program perform an action or set of actions repeatedly, you will need to create a looping construct of some sort. OmniMark provides three types of looping constructs, repeat, repeat over, and repeat scan.

The simplest is a repeat...again. This form of loop will simply repeat the execution of the actions it contains, until an explicit exit action is encountered in the loop.

  process
     local counter count1
     repeat
        output "count1 is %d(count1)%n"
        increment count1
        exit when count1 = 4
     again

This repeat...again will execute the output action until the counter "count1" equals 4 at which point the exit action will execute and the loop will terminate, resulting in the following output:

  count1 is 1
  count1 is 2
  count1 is 3

The second type of looping construct is a repeat over...again. This type of loop is used to iterate over a shelf and perform a set of actions on each item that exists on that shelf. For example, the following program will output the values of each item contained on the stream shelf "names":

  global stream names variable initial {"Bob", "Doug", "Andy", "Greg"}

  process
     repeat over names
        output names || "%n"
     again

repeat over loops can be used to iterate over any type of shelf, and the loop is terminated after the last item on the shelf has been processed.

Arithmetic and comparisons

If you've got two or more values or variables and you want to do something with them or to them, you need an operator.

The most common sorts of operators are arithmetic operators, such as those which perform addition or multiplication. For example:

  process
     local counter x
     set x to 1 + 1
     output "%d(x)%n"

The arithmetic operators available in OmniMark are + (addition), - (subtraction), * (multiplication), / (division), and modulo (the remainder you get when you divide the number by the base value).

OmniMark also provides a full set of operators that are used to compare two or more numeric values. For example:

  process
     do when x = y
        output "equal%n"
     else when x > y
        output "greater%n"
     else when x < y
        output "lesser%n"
     done

The other available numeric comparison operators are != (not equal), >= (greater than or equal to), and <= (less than or equal to).

Other common operators are & and | (the and and or Boolean operators). These are usually used to create more complex conditions and tests in OmniMark. For example:

  process
     do when x = y & z > 4
        output "first test is true%n"
     else when y = z | x = 4
        output "second test is true%n"
     else
        output "neither test is true%n"
     done

Pattern matching

OmniMark allows you to search for particular strings in input data using find rules. For example, the following find rule will fire if the string "Hamlet:" is encountered in the input:

  find "Hamlet:"
     output "<b>Hamlet</b>: "

Using this method, however, you would have to write a separate find rule for each character name you wanted to enclose in HTML bold tags. For example:

  find "Hamlet:"
     output "<b>Hamlet</b>: "
  find "Horatio:"
     output "<b>Horatio</b>: "
  find "Bernardo:"
     output "<b>Bernardo</b>: "

As you can imagine, this is a pretty inefficient way to program.

This is where OmniMark "patterns" come in. OmniMark has rich, built-in, pattern-matching capabilities which allow you to match strings by way of a more abstract "model" of a string rather than matching a specific string. For example:

  find letter+ ":"

This find rule will match any string that contains any number of letters followed immediately by a colon.

Unfortunately, the pattern described in this find rule isn't specific enough to flawlessly match only character names. It will match any string of letters that is followed by a colon that appears anywhere in the text, meaning that words in the middle of sentences will be matched.

Words that appear in the middle of sentences rarely begin with an uppercased letter, while names usually do. This allows us to add further detail to our find rule:

  find uc letter+ ":"

This find rule matches any string that begins with an uppercase letter (uc) followed by at least one other letter (letter+) and a colon (":").

If we were actually trying to mark up an ASCII copy of "Hamlet", however, our find rule would only match character names that contain a single word, such as "Hamlet", "Ophelia", or "Horatio". Only the second part of two-part names would be matched, so the names of "Queen Gertrude", "Lord Polonius", and so forth, would be incorrectly marked up.

In order to match these more complex names as well as the single-word names, we'll have to further refine our find rule:

  find uc letter+ (white-space+ uc letter+)? ":"

In this version of the find rule, the pattern can match a second word prior to the colon. The pattern (white-space+ uc letter+)? can match one or more white-space characters followed by an uppercase letter and one or more letters. All of this allows the find rule to match character names that consist of one or two words.

If you wanted to match a series of three numbers, you could use the following pattern:

  find digit {3}

To match a date that occurs in the" yy/mm/dd" format, the following pattern could be used:

  find digit {2} "/" digit {2} "/" digit {2}

A postal code could be matched with the following pattern:

  find letter digit letter "-" digit letter digit

The letter and uc keywords that are used to create the patterns shown above are called "character classes". OmniMark provides a variety of these built-in character classes:

letter -- matches a single letter character, uppercase or lowercase
uc -- matches a single uppercased letter
lc -- matches a single lowercased letter
digit -- matches a single digit (0-9)
space -- matches a single space character
blank -- matches a single space or tab character
white-space -- matches a single space, tab, or newline character
any-text -- matches any single character except for a newline
any -- matches any single character

Any pattern can be modified through the use of occurrence operators:

+ (one or more)
* (zero or more)
? (zero or one)

So, as shown in the find rules above, for example, letter+ matches one or more letters, letter* matches zero or more letters, and uc? matches zero or one uppercase letter.

It is also possible for you to define your own customized character classes. For example:

  find ["+-*/"]
     output "found an arithmetic operator%n"

This find rule would fire if any one of the four arithmetic operators was encountered in the input data.

Compound character classes can be created using the except or or keywords:

  find [any except "}"]

The find rule above would match any character except for a right brace.

This find rule would match any one of the arithmetic operators or a single digit:

  find ["+-*/" or digit]

This one would match any of the arithmetic operators or any digit except zero ("0"):

  find ["+-*/" or digit except "0"]

Pattern variables

When using patterns to match sections of input data, you must first capture the data in pattern variables for later use. Pattern variables are assigned using the => symbol, and referenced later. For example, in the first find rule in the following program the matched input data is assigned to the "found-text" pattern variable.

  process
     submit "Mary had a little [white] lamb"

  find ("[" letter+ "]") => found-text
     output found-text

  find any

This program outputs "[white]".

What if you want to output only the word in the square brackets, but not the brackets themselves? Try this:

  process
     submit "Mary had a little [white] lamb"

  find "[" letter+ => found-text "]"
     output found-text

  find any

This program outputs "white". Here, the pattern variable is attached only to the part of the pattern immediately preceding the pattern variable assignment. In fact, this is the default behavior of pattern variables. That's why, to make the previous example work correctly, we had to surround the three elements of the pattern with parentheses to ensure that the text matched by the whole pattern was captured.

You can have more than one pattern variable in a pattern. You can even nest them. For example:

  process
     submit "Mary had a little [white] lamb"

  find 	("[" => first-bracket
     letter+ => found-word
     "]" => second-bracket) => found-text

     output first-bracket
     output found-word
     output second-bracket
     output found-text

  find any

The output of this program would be "[white][white]". The first "[white]" is the result of the first three output actions, and the second the result of the fourth output action.

Organizing your program

OmniMark programs have a definite style which reflects the kind of programming you do with OmniMark. OmniMark is used for manipulating and transforming data, either as text or as markup and, in doing so, it responds to events which occur in the data. Since there is no way to predict in advance the order or relationships of data events, there is no way to predict the order of execution of an OmniMark program.

Many programming languages encourage nested code, with functions calling functions calling functions. This helps modularize functionality in a regular programming language. It also makes the execution path rigid and makes it difficult to react to complex sequences of events. OmniMark code is very flat. While you can define and use functions, they are used only within OmniMark's principal execution unit, the rule, and cannot contain rules themselves. All OmniMark rules exist at the base level of the program. In OmniMark you tend to find not nested code, but nested execution.

In processing complex markup, with many nested elements, rules are invoked at each level as appropriate. If you are seven layers of markup deep, seven rules are in mid-execution. This means that you do not have to maintain complex state tables or parse trees. The current execution state of the OmniMark program itself maintains the current parse state for you and makes it easily addressable.

Since you cannot tell in advance the order in which the execution of rules may be nested, nesting the rules themselves would make no sense. Hence the simplicity and flatness of a typical OmniMark program.

Nevertheless, you can and should encapsulate common functionality in your OmniMark programs. OmniMark provides several facilities to do this including functions, groups, macros, and include files.

If you are writing a program that does batch translation, you can save a lot of time and code for initialization and flow control by using one of OmniMark's aided translation types.

Functions

If you have a piece of code that you want to execute repeatedly in a program, you might want to define that code as a function. One property of functions is that they are encapsulations of code that can be "called" or executed from another point in a program. For example, you could define a function that issues error messages out to a file:

  define function Report
     value stream msg
  as
     reopen log-file as file "MyProgram.log"
     put log-file date "xY/M/D h:m:s" || " MyProgram: %g(msg)%n"
     close log-file

It is possible to define functions that return numeric, string, or Boolean values using the counter, stream, and switch keywords, and you can also define functions that don't return values. A function name can be anything you want it to be, as long as it is a single string of characters. For example, "add_total", "do-this", or "MacBeth" could all be function names.

In the function shown above there is only one argument, "stream msg".

Rule groups

By default, all OmniMark rules are active all the time. You can change this by bundling your rules into groups:

  group mary
     find "lamb"
        ...
     find "school"
        ...

  group tom
     find "piper"
        ...
     find "pig"
        ...

  group #implied
  process-start
     using group mary
     do
        submit "Mary had a little lamb"
     done
     using group tom
     do
        submit "Tom, Tom, the piper's son"
     done

In this program, only rules in the group "mary" are used to process "Mary had a little lamb". Only rules in the group "tom" are used to process "Tom, Tom, the piper's son".

Why the group #implied before the process-start rule? The process-start rule is a rule like any other, so it is affected by groups like any other rule. group #implied stands for the default group. (In a program with no groups, all rules are in the default group.) Only the default group is active when a program starts. All other groups are inactive. So, you have to have at least one rule in the default group in order to activate any of the other groups. If we didn't place the process-start rule into the default group, no rules would ever be active in this program.

Any rule that occurs before the first group statement in your program automatically belongs to the default group, but, if you use groups, it is usually a good idea to place your global rules explicitly into group #implied. (Consider what would happen if you included a file that contained group statements at the top of your main program file and didn't explicitly assign your global rules to group #implied.)

All rules in the default group are global. You cannot disable the default group, so rules in the default group are always active. For this reason, you may want to keep the number of rules in the default group to a minimum (but remember, you must have at least one).

Can you have more than one group active at a time? Certainly:

  using group mary and tom and dick and harry

You can also add a group to the current set of active groups using "#group" to represent all active groups:

  using group mary and tom and #group

Constants and macros

If you are repeating a section of code or text multiple times in a program, you might want to create a macro to simplify program creation and maintenance.

Essentially, a macro is just a method for creating a shorthand reference that will later be replaced by a larger piece of code or text as specified in the macro definition. For example, if you want to include a piece of debugging code or if you are repeating the name of a company multiple times in a program, you could create a macro that contains that code or company name, and instead of typing the full text or code every time you need it, you could simply use the shorthand version that you have defined in the macro. Not only does this reduce the time required to create a program (by cutting down on typing), it also reduces the number of potential typos. Additionally, if the code or the name of the company should change, rather than searching through an entire program to replace each occurrence, you need only change the text contained in the macro, and it will automatically be changed in the rest of the program.

A macro is created using the macro and macro-end keywords. The macro-end keyword is required because a macro can contain any text or code, so there is no other way for the program to know where the macro definition ends and the rest of the program begins. The following macro definition creates a shorthand reference for the company name "OmniMark":

  macro om is
     "OmniMark"
  macro-end

All this does is tell the program that every time it encounters the short form "om", it's supposed to replace that short form with the full text OmniMark.

Although the following macro definition looks significantly more complex, the basic principles are the same:

  ; Macro to dump a switch shelf (for debugging purposes)
  ;
  macro Dump Switch token s is
     do
        output "Switch %@(s) has " || "d" format number of s || " items%n"
        repeat over s
           output "  %@(s) @ %d(#item)"
           output " ^ " || key of s when s is keyed
           output " = "
           do when s
              output "true"
           else
              output "false"
           done
           output "%n"
        again
     done
  macro-end

This macro will replace each occurrence of the macro name "Dump Switch" with the code that appears between the keywords "is" and "macro-end" in the macro definition.

Once defined, macros are very simple to use. For example, the following program will output "Welcome to OmniMark. OmniMark is located in Ottawa, Ontario, Canada.":

  macro om is
     "OmniMark"
  macro-end

  process
     output "Welcome to " || om || ". "
     output om || " is located in Ottawa, Ontario, Canada.%n"

Macros don't have to be exact repetitions of a chunk of text. Macros can take "arguments", which are simply values given to the macro and used to change slightly the text that replaces the shorthand reference. The macro "Switch Dump", shown above, takes one token argument ("s").

One thing to note about macros is that all macro references are replaced with the full text of the macro before the program is compiled and run. What this means is that if you are using OmniMark LE, defining a section of code in a macro won't cut down on the total number of actions that appear in your program. Each time a macro is used, any actions contained in that macro are counted. So, if you define a macro that contains two actions and you use that macro five times in a program, it will count as ten actions towards the total number of actions in the program.

Including code from other files

OmniMark allows you to include code that is contained in another file in an OmniMark program by way of an include declaration. This feature allows you to easily recycle useful bits of code without having to resort to "copy and paste". Additionally, this means that if you decide you want to change something in that particular piece of code, rather than having to track down every individual usage of it, you need only make the changes in the single file that you have "included" in the other programs.

For example, you have defined an OmniMark function that you are particularly proud of, and that is useful in several different programs you're working on. For example, the function "Report", as follows:

  define function Report
     value stream msg
  as
     reopen log-file as file "MyProgram.log"
     put log-file date "xY/M/D h:m:s" || " MyProgram: " || msg || "%n"
     close log-file

The function Report simply outputs the value of the stream "msg" into the file MyProgram.log with a time stamp. If this function were the only thing to be in a file called report.xin, that function could then be included in any OmniMark program by naming that file in an include declaration:

  include "report.xin"

Initialization and termination rules

In regular OmniMark programs, there is no real distinction between process-start, process, and process-end rules except that they are performed in that order. Additionally, process-start and process-end rules can be used in any of the aided-translation-type programs. OmniMark doesn't distinguish what can be done with these rules, but process-start and process-end rules should be used only for performing processes that must be executed at the beginning or end of a program, respectively. Usually these processes include whole-program initiation and termination functions. process rules should be used for the main processing within a program.

process-start rules allow you to do processing and produce output at the earliest stages of a program, more or less adjacent to macro and function definitions and global variable declarations. One use of a process-start rule would be to allocate handles and connect to a database:

  process-start

  local SQL_Handle_type EnvironmentHandle
  local SQL_Handle_type ConnectionHandle
  local SQL_Handle_type StatementHandle
  local counter RetCode

  set RetCode to SQLAllocEnv (EnvironmentHandle)
  output "Allocating environment handle - "
  do when RetCode != SQL_SUCCESS
     output "failed%n"
     halt with 1
  else
     output "passed%n"
  done

  set RetCode to SQLAllocHandle
     ( SQL_HANDLE_DBC, EnvironmentHandle, ConnectionHandle )
  output "Allocating connection handle - "
  do when RetCode != SQL_SUCCESS
     output "failed%n"
     halt with 1
  else
     output "passed%n"
  done

  set RetCode to SQLConnect ( ConnectionHandle, "omodbc", 20, "", 0, "", 0 )
  output "Connecting to database - "
  do when RetCode != SQL_SUCCESS
     output "failed%n"
     halt with 1
  else
     output "passed%n"
  done

Similarly, a process-end rule could be used to disconnect from the database and free the handle resources:

  process-end

  set RetCode to SQLDisconnect ( ConnectionHandle )
  output "Disconnecting from database - "
  do when RetCode != SQL_SUCCESS
     output "failed%n"
     halt with 1
  else
     output "passed%n"
  done

  set RetCode to SQLFreeHandle (SQL_HANDLE_DBC, ConnectionHandle)
  output "Freeing connection handle resources - "
  do when RetCode != SQL_SUCCESS
     output "failed%n"
     halt with 1
  else
     output "passed%n"
  done

  set RetCode to SQLFreeHandle (SQL_HANDLE_ENV, EnvironmentHandle)
  output "Freeing environment handle resources - "
  do when RetCode != SQL_SUCCESS
     output "failed%n"
     halt with 1
  else
     output "passed%n"
  done

Aided translation types

If you are building filters or other types of batch translation programs, you can use OmniMark's aided translation types to simplify your code.

The submit, do sgml-parse, and do xml-parse actions allow you to direct input to the pattern processor and the markup processor within a normal OmniMark program. The using output as and output-to actions let you control where your output will go. Aided translation types simplify matters by automatically sending input from a source specified on the OmniMark command line directly to a particular processor, and output directly to the appropriate destination for the type of translation. Each translation type hooks up the text and the markup processors in a different way.

The cross-translate translation type is used for conventional pattern processing chores. It sends input directly to the pattern processor; output goes directly to #main-output.

The up-translate translation type is used to add markup to data. It sends input directly to the pattern processor and sends output to #main-output, just as in a cross-translate. However, it also sends a copy of the output to the markup processor. The markup processor does not produce any output, but it will raise errors if the markup it receives is incorrect. This checks the validity of the markup that is produced.

The down-translate translation type is used to convert marked-up data into another form (which may or may not contain markup). A down-translate sends input directly to the markup processor; output goes directly to #main-output.

The context-translate translation type is used to convert data from one format to another using a particular markup as an intermediate format. A context-translate sends input directly to the pattern processor and sends the output of the pattern processor to the markup processor. Output from the markup processor goes to #main-output.

You cannot use the process rule in a program that uses an aided translation type. (Though you can use process-start and process-end rules.)

There are also some rule types designed specially for use in aided translation programs:

Aided translation types take their main input from the command line or files named on the command line. However, you can still use submit, do sgml-parse, and do xml-parse to submit data to the text or markup processors. Note that you cannot use do sgml-parse or do xml-parse in a cross-translation.

User interface

Depending on how you look at it, OmniMark has either the most sophisticated user interface of any programming language, or no user interface at all.

OmniMark has no built-in commands for communicating with the user while the program is running. There is no equivalent to Basic's input or C's scanf. Nor is there any window management facility. OmniMark is designed for creating programs that run either in batch mode or as servers, so complex communication with the local user is not appropriate.

On the other hand, OmniMark excels at parsing information, and is therefore adept at communicating with other programs by means of messages. This gives OmniMark great user interface flexibility. You can create a user interface to an OmniMark program or server in any language, on any platform, over any network that supports TCP/IP. This may include many different user interfaces for different users. A web browser makes an excellent user interface tool for an OmniMark server. (After all, OmniMark is the most powerful server-side programming language for the web.)

Ways of communicating with an OmniMark program or server include:

The command line. You give initial instructions to an OmniMark CI program on the command line.
The console. Running OmniMark programs and servers can output messages to your screen (technically, the standard output of your particular operating system).
Log files. You can have your OmniMark program write messages to a log file or logging process.
Web pages. Your OmniMark server can create web pages on the fly to present information to the user.
Web forms. You can use web forms to allow the user to send information to your OmniMark server. URLs that point to your OmniMark server also communicate information to the server.
Custom clients. You can program a GUI application in the language of your choice and communicate with an OmniMark server using TCP/IP. (This works on a local machine as well as over a network.)
File-based integration. You can communicate with an OmniMark program by exchanging files. For example, your GUI-based application could write out a file and invoke an OmniMark program to process that file. The OmniMark program would process the file and write out the result to the file specified by the calling application.

Command-line interface

If you are writing non-server-type OmniMark programs, you will run those programs using the command-line interface. This means that you will issue an "omnimark" command (or an "omle" command if you're using OMLE) followed by one or more command-line options and arguments, and at least one filename.

For example, if you had a program called "test2.xom" that didn't require any further options or filenames to run successfully, you would execute it by typing the following on the command line:

  omnimark -s test2.xom

"omnimark" (or "omle") will always be the first word in any set of instructions issued on the command line. The "-s" is a command-line option (short for "source program") that precedes the name of the file that contains the OmniMark program you want to run. In this case that file is "test2.xom".

In addition to command-line options, the names of files can be specified on the command line. Any words that appear on the command line that are not recognized as command-line options are placed on the #command-line-names built-in shelf. Anything that appears on the command line that is not preceded by a "-letter" will be placed on the #command-line-names shelf.

In addition to the "-s", there are numerous other options that can be specified on the command line when running an OmniMark program. The most commonly used of these options are as follows (please note that this is only a partial list, a complete list is available):

-activate switch-name

activates the OmniMark switch variable named switch-name. "-a" can be used as a short form of this option.

-alog log-path

specifies that any error messages that OmniMark produces are appended to the error file given by "log-path". Unlike "-log" (below), "-alog" appends messages to the end of the file specified by "log-path", if the file exists. If the file does not exist, "-alog" will create it. If neither "-log" nor "-alog" are specified on the command line, error messages are written to the program's standard error stream. On most computer systems, errors will be displayed on the user's screen.

-aof output

specifies #main-output, the system-specific name of a file into which standard OmniMark output is written. The difference between this option and the "-of" option is that output is written to the end of the named file if it already exists. The file specified using the "-aof" or "-of" command-line options becomes the destination of the built-in stream called #main-output. This stream is the default program output stream. If there is no "-aof" or "-of" on the command line, then #main-output identifies the same output destination as #process-output (which is standard output).

-argsfile command-file-path

specifies that some of the contents of the command line are in the arguments file given by "command-file-path". When an "-argsfile command-file-path" option is encountered, the contents of "command-file-path" are immediately processed as if they appeared on the command line. Multiple arguments files may be specified on the command line, and arguments files can refer to other arguments files. "-f argsfile" can be used as a short form of this option.

-counter counter-name value

sets the OmniMark counter named "counter-name" to "value", prior to running an OmniMark program. Any initial specification for the specified counter is ignored. "-c" can be used as a short form of this option.

-define stream-name content

specifies the stream "stream-name", which OmniMark opens as a buffer, places content specified by "content" in it, and then closes. The effect of this argument is to have a buffer with defined content when OmniMark starts processing. Each stream may be defined only once. This argument cannot be specified when the "-source" and "-save" options have both been specified. "-d" can be used as a short form of this option.

-help

causes OmniMark to display a list of its command-line options.

-include include-path

specifies a directory in which to look for files to be included. If the file specified in an include declaration within the program cannot be opened, OmniMark looks for a file of the same name in directory "include-path". There can be multiple occurrences of this argument on a command line. OmniMark will inspect the directories in the order in which they occur on the command line until a file of the specified name is found. OmniMark processes "include-path" in a system-independent manner. To look in the "include-path" directory, OmniMark simply appends "include-path" to the front of the file name it is trying to include. This has two consequences. First, the "include-path" must have a trailing directory name separator, as required by the operating system on which OmniMark is being run. Second, OmniMark will not remove any directory name prefixes from a file name before appending the "include-path". "-i" can be used as a short form of this option.

-log output-file

causes OmniMark to write any error messages to the system-specific file named "output-file". If this argument is not specified, error messages are written to the program's standard output. On many computer systems, they appear on the user's screen.

-of output

specifies #main-output, the system-specific name of a file into which standard OmniMark output is written. This argument is equivalent to "-os output #main-output". This argument cannot be specified when the "-source" and "-save" options have both been specified.

-temppfx temppfx

specifies "temppfx", a prefix that OmniMark uses to create temporary files.

-version

causes the OmniMark banner with the copyright information, date, and version information to output to the standard error, and then halts.

-warning

enables the display of informative messages that indicate possible trouble areas in the OmniMark program being executed. Usually these warning messages are suppressed.

-x path-name/=L

specifies a path to an external function library. When using this option to specify the path and file name extension of the external function libraries, "=L" must be used to mark the place in the libpath where the file name of the library (as specified in the library declaration in the OmniMark program) would appear. For example, a typical command line specifying both the path and file name extension for the external function libraries on a UNIX system could be:

  omnimark -xflpath /common/omnimark/lib/=L.so

Markup languages

In OmniMark documentation, almost all references to "markup languages" are actually references to element-based markup languages that have been created using either SGML (Standard Generalized Markup Language) or XML (the eXtensible Markup Language). A markup language is a full set of markup instructions which can be used to comprehensively describe the structural information content of a piece of text. Markup tags are the actual pieces of code that are added to the electronic document.

When you create a markup language using SGML or XML, you are defining a set of tags which can then be used to demarcate the structure of your documents. Because SGML and XML are used to create sets of markup tags, they are "metalanguages", languages that describe other languages. The benefits of using SGML and XML to create these markup languages are numerous; since they are internationally recognized standards, this standardization allows the marked up documents to be portable across platforms. Additionally, with SGML or XML you are able to create fully customized languages that will most comprehensively treat your unique markup requirements.

Markup instructions can be interpreted by applications that use the markup to determine the formatting of a document. When used this way, the markup usually has an immediate and specific effect on the text, either by changing the appearance of the characters (by rendering them in a bold or italic font, for example), or by affecting the positioning of the text (such as by changing the margin, indent, and spacing values).

For example, HTML is a markup language whose elements describe the formatting of a document when that document is processed by an HTML browser or similar application:

  <html>
  <head>
  <title>Hamlet</title>
  </head>
  <body bgcolor="#ffffff" text="#000000">
  <div align=center>
  <font size=5>
  <b>Hamlet</b>
  <p><font size=3><b>Act I, Scene I</b>
  <p><i>Francisco at his post. Enter to him Bernardo</i>
  </div>
  <p><b>Bernardo:</b> Who's there?
  <p><b>Francisco:</b> Nay, answer me: stand and unfold yourself.
  <p><b>Bernardo:</b> Long live the king!
  </body>
  </html>

The elements in this short HTML document affect the alignment, size, and appearance of the text.

When interpreted by applications, specific markup languages also detail the structure of a document, identifying the various internal components of which it is made. These components can include things such as paragraphs, headings, sections, subsections, names, titles, chapters, volumes, articles, and so on. The possible list of document components is endless, but each specific markup language can only be used to identify a small set of these.

For example, the following document is marked up using a very simple language created with XML:

  <play>
  <title>Hamlet</title>
  <act><scene>
  <scenedesc>Elsinore. A platform before the castle.</scenedesc>
  <stagedir>Francisco at his post. Enter to him Bernardo</stagedir>
  <char>Bernardo</char>
  <line>Who's there?</line>
  <char>Francisco</char>
  <line>Nay, answer me: stand, and unfold yourself.</line>
  <char>Bernardo</char>
  <line>Long live the king!</line>
  </scene></act>
  </play>

The elements in this short XML document are used to identify, to an XML application, the components and structure of the information it contains.

Wherever possible, OmniMark uses the same names and terminology as the SGML and XML specifications.

Markup rules

OmniMark provides a complete set of markup rules that can be used to process documents that have been marked up with SGML- or XML-based markup languages. These rules correspond to all the features of SGML and XML, and are as follows:

data-content rules, allowing you to capture the parsed character data content of elements.

document-end rules, fired immediately after the parsing of an SGML or XML document has been completed in an aided translation type program. document-end rules are "termination" rules, meaning that they're useful for doing process cleanup and final processing before a program completes.

document-start rules, fired just before the implicit parsing of an SGML or XML document begins. document-start rules are "initialization" rules, making them useful for doing any sort of program setup that has to be done before the main processing begins. These rules can only be used in aided translation type programs.

dtd-end rules, used in programs that process marked-up documents that contain a DTD. dtd-end rules are fired after the DTD has been completely processed.

dtd-start rules, fired after the doctype element has been specified in a DTD, but before the main part of the DTD is processed.

element rules, used to execute specified actions when the element named in the element rule is encountered in the input document. It is important to note that each element that appears in an SGML or XML document must be uniquely accounted for in an OmniMark program. You must have an element rule that can be fired for each individual occurrence of every element in a document.

epilog-start rules, used in programs that process marked-up documents that contain a document epilog. epilog-start rules fire just before the processing of the document epilog begins.

external-data-entity rules, used to specify special processing of external data entities that are encountered in SGML or XML documents. Note that you must have an external-data-entity rule that can be fired for each occurrence of an external data entity in a document.

external-text-entity rules, used to provide the full-text replacement for each external text entity that appears in the input document.

invalid-data rules, used in processing SGML and XML documents and give you control over how erroneous data appearing in the input document is processed.

marked-section rules, provided so that you can specify the processing of any type of marked section that appears in an SGML or XML document. Marked sections types include cdata, rcdata, ignore, and include.

markup-comment rules, fired whenever a markup comment is encountered in an input document and allow you to control how the content of the comment is processed.

markup-error rules, fired if an error is encountered in the markup of an input document.

processing-instruction rules, giving you control over how processing instructions that are encountered in SGML and XML documents are processed.

prolog-end rules, used in programs that process marked-up documents that include a document prolog. prolog-end rules are fired just after the prolog has been completely processed.

prolog-in-error rules, fired if an error is encountered in the prolog of a marked-up input document.

sgml-declaration-end rules, used in programs that process SGML documents. All SGML documents contain an SGML Declaration, whether it be explicit or implicit, so these rules will always fire if they are used. sgml-declaration-end rules fire after the declaration has been completely processed.

translate rules, fired when data content matching a specified pattern occurs within an element of an SGML or XML input document.

------

----

[CONTENTS] [CONCEPTS] [SYNTAX] [LIBRARIES] [SAMPLES] [ERRORS] [INDEX]

Generated: April 21, 1999 at 2:01:33 pm
If you have any comments about this section of the documentation, send email to [email protected]