HOME | COMPANY | SOFTWARE | DOCUMENTATION | EDUCATION & TRAINING | SALES & SERVICE

    "The Official Guide to Programming with OmniMark"

Site Map | Search:   
OmniMark Magazine Developer's Forum   

  International Edition   

OmniMark® Programmer's Guide Version 3

1. Introduction

Detailed Table of Contents

Next chapter is Chapter 2, "Types of OmniMark Programs".

1.1 What is OmniMark?

OmniMark provides a powerful programming environment for the serious professional programmer who needs a language tuned to the needs of both text-based and binary stream processing. OmniMark combines:

Applications for OmniMark extend throughout the spectrum of text-processing, including, but not limited to:

OmniMark integrates a fully ISO 8879 conformant SGML parser with an English-like rule-based programming language. This affords OmniMark programmers with a number of choices for processing a document:

The document you are reading is written and maintained in SGML. All of the indexing, cross-referencing, formatting, and hypertext linking is automatically generated (or configured) by an OmniMark program. Other formats of this document are produced by other OmniMark programs.


1.2 OmniMark's Users

To read and understand this manual, an OmniMark user is not assumed to have previous experience with OmniMark and need not be a software engineer. To implement simple OmniMark applications, and to understand this document, the reader needs no more programming skills than those used in writing SGML Document Type Definitions, writing formatter macros, or working with the fourth-generation computer languages that many interactive database management systems provide. Complex translations, however, are better attempted by experienced software developers.

Many parts of this manual assume familiarity with SGML.


1.3 How OmniMark Works

OmniMark is a rule-based language. An OmniMark program primarily consists of an optional translation type and one or more rules.

The translation type is a statement that determines what kind of rules are permitted in the program, which subsystems of OmniMark will be active, and automatically sets up the interaction of these subsystems. These are described in Chapter 2, "Types of OmniMark Programs".

A rule is a part of the program which executes when a particular event occurs in the input document. A rule consists of a rule header, an optional condition and zero or more actions.

A rule header describes the event that causes the rule to execute.

A condition is a phrase applied to a rule or action and contains a test which evaluates to either true or false. Conditions control whether the rule or action can be activated. If the test in the condition evaluates to false, the rule or action is not performed.

An action is a statement that OmniMark executes. Actions usually change the state of the program.


1.4 The High-Level Structure of OmniMark Programs

In addition to the translation type and rules, an OmniMark program can also contain declarations, functions, and macros.

Declarations are used for declaring variables and for modifying the behaviour of the OmniMark program.

Functions are used to encapsulate a set of actions that perform a specific task or calculate a result. Instead of repeating the code every time the task or calculation needs to be done, the code only needs to be in one place, in the function. The function is then called whenever that activity needs to be done.

Macros define parameterized text substitutions. They can be used to encapsulate a sequence of code that needs to be placed at more than one location in the program. By placing the code in a macro, the programmer can invoke the macro instead of repeating the code. Unlike a function, a macro can contain anything which may appear in an OmniMark program.


1.5 Syntax

OmniMark is a free-format, English-like language. Keywords are used to introduce OmniMark rules, declarations, actions, conditions, function definitions, and macro definitions.

Rules, declarations, and function definitions are all terminated when a keyword introducing a new rule, declaration, function definition, or macro definition is encountered. Actions and conditions are terminated by the start of another action, rule, declaration, function definition, or macro definition.

Only macro definitions have explicit terminators. This is because a macro defines a very general text substitution. The text to be substituted can contain anything at all, including whole rules, or declarations, or even definitions of other macros. Because anything is permitted inside a macro, macros must be explicitly terminated with the MACRO-END keyword.

1.5.1 Declaration Examples

Declarations begin with a keyword that describes its purpose, or with the word DECLARE. They are used to customize aspects of OmniMark's behaviour.

Some example declarations are:

Example A

   DECLARE DATA-LETTERS "%142#%143#" "EE"

Example B

   ESCAPE "!"

Example C

   GLOBAL COUNTER number-of-columns

Variable declarations are described in Section 6.1, "Introduction to Variable Declarations" and Section 7.1, "Shelf Declarations". Other declarations are described in Chapter 19, "Customizing OmniMark Behaviour".

1.5.2 Rule Examples

A rule begins with a keyword that identifies the event that triggers it. ELEMENT rules recognize SGML elements. FIND rules recognize text patterns. PROCESSING-INSTRUCTION rules recognize SGML processing instructions.

Sample rule headers include:

Example A

   FIND-START

Example B

   FIND "<" LOOKAHEAD ( LETTER | DIGIT )

Example C

   DATA-CONTENT

Example D

   BREAK-WIDTH 72 80

Different types of rules are described in different chapters depending on their function:

1.5.3 Function Examples

Function definitions always begin with the keyword DEFINE, followed by the keyword EXTERNAL if the function is an external function, an optional return type, and then the keyword FUNCTION.

Functions can either use the conventional comma-and-parenthesis style for delimiting arguments, or they can use a more expressive form that uses name tokens to herald the arguments.

Some example function definition headers are:

Example A

   DEFINE COUNTER FUNCTION raise VALUE COUNTER base
                              to VALUE COUNTER exponent
      AS

Example B

   DEFINE COUNTER FUNCTION pow (VALUE COUNTER base, VALUE COUNTER exponent)
      AS

Functions are described in Chapter 12, "Functions".

1.5.4 Example Macro Definitions

Macro definitions begin with the keyword MACRO and end with the keyword MACRO-END. An example of a macro definition is:

   MACRO sgml-name IS (LETTER [LETTER | DIGIT | "-."]*) MACRO-END

Macros are described in Chapter 20, "Macros".

1.5.5 Examples of Actions

Actions generally begin with a verb. INCREMENT, OPEN, and SET are all examples of actions. DO and REPEAT are examples of compound actions, that enclose a sequence of actions up to the terminating DONE or AGAIN respectively.

Some example actions are:

Example A

   INCREMENT i

Example B

   OPEN out AS FILE "myprog.out"

Example C

   CLEAR x

Actions are described throughout this manual, according to their purpose.

1.5.6 Examples of Conditions

Conditions consist of WHEN or UNLESS followed by a test expression. (Test expressions are described in Section 9.3, "Test Expressions".)

Conditions can appear in several places:

The following is an example of a condition:

   WHEN ATTRIBUTE docno IS SPECIFIED OR (chapno > 1) AND xref ISNT OPEN

1.6 Low-Level Syntax

At the lowest level, OmniMark programs are ultimately composed of keywords, names, quoted strings, numbers, punctuation characters, white space, and comments.

1.6.1 White Space

White space is used to separate keywords from each other, and can be used to separate punctuation characters from keywords, or from each other. White space is made up of spaces, tabs, carriage returns, and newline characters. (On some systems, the "RETURN" key (or "ENTER" key) inserts newline characters, and on others it inserts both a carriage return and a newline.)

The amount of white space that a programmer uses is a stylistic decision, and generally has no effect on the behaviour of the OmniMark program. Thus, the following examples are equivalent:

Example A

   ELEMENT (blist | nlist)

Example B

   ELEMENT(blist |
      nlist)

There is one situation in which white space is needed around punctuation. Hyphen characters can either be used within OmniMark names, or alone as a minus sign. Programmers must always be careful to separate the hyphen from any preceding name with white space, if the programmer wishes to use it as a minus sign.

   a-b

a-b is considered to be a single name. If white space is included around the hyphen characters:

   a - b

OmniMark interprets the hyphen character as a minus sign and the expression represents the difference between the counter a and the counter b. For this reason, it is probably wise to adopt the habit of surrounding all operators with spaces.

OmniMark programs are easier to read if they are formatted in a consistent manner. In examples in this manual, actions are indented and rules are usually separated by blank lines. Programmers are encouraged to establish conventions they find convenient.

1.6.2 Keywords

Keywords are words that have a special meaning to OmniMark. Keywords are never case sensitive.

If a programmer defines a variable, or a function with a name that is the same as a keyword, then that word is no longer a keyword. It becomes just a name. In order to keep using a word as a keyword when it is also being used as the name of a variable or function, the word must be immediately preceded with a backquote ("`"). (This is discussed in greater detail in Section 21.1.2.2, "Backquoting Names"). In general, it is considered bad practise to give a function or a variable a name that is also an OmniMark keyword.

SGML names differ from programmer-defined names in this regard. Because the names of SGML objects are always heralded by a keyword like ELEMENT or ATTRIBUTE, there is no ambiguity when such an object has a name which is also a keyword. Context will determine the meaning of the word.

1.6.3 Names

Names in OmniMark may refer to SGML constructs (such as elements, entities, attributes, and notations) or to OmniMark objects (such as variables, functions, and macros).

Generally, names are tokens beginning with a letter and containing letters, digits, hyphens ("-"), underscores ("_"), or periods ("."). (Because a hyphen can be interpreted as part of a name, the programmer must surround the character with spaces when they want the hyphen interpreted as a minus sign.)

When heralded, a quoted strings may also be used as a name. Quoted names allow names to contain characters that normally would not be permitted. This can be useful when a programmer is generating an OmniMark program from another source.

Heralding is the practice of preceding a variable name with a keyword indicating its type. (Heralding names is described in Section 6.1.4, "Use of Heralds" and Section 21.1, "The Reduced Language".)

Names of SGML objects must always be heralded because they are not declared in the OmniMark program. OmniMark variables, such as pattern variables, switches, counters, and streams may be heralded or not, as the programmer chooses. If they are not heralded, then quoted strings may not be used for their names.

Because of the heralding, OmniMark can always determine the type of an SGML object from the way its name is used, and so there is no problem with having different kinds of SGML objects with the same name. For instance, an element can have the same name as an attribute.

Different OmniMark variables may only have the same name if they are declared in different scopes. This is discussed further in Chapter 8, "Variable Scopes". Once a function is declared, its name becomes unavailable as a name for other objects.

OmniMark puts no limit on the length of a name other than the 2,048 character limit on the length of a string. However, names of SGML constructs will only be processed correctly if they conform to names in the relevant concrete syntax.

1.6.4 Punctuation

Punctuation characters usually occur:

The hyphen ("-") can occur both as the minus operator ("-") and within names. Because of this, it is strongly recommended that the hyphen be preceded by white space whenever it is used as the minus operator.

1.6.5 Comments

The OmniMark programmer can annotate a program with descriptive text (called a comment) that is ignored by OmniMark. Comments can appear in an OmniMark program anywhere white space is allowed; that is, anywhere except within a name, keyword, or string.

Comments begin with a semicolon and continue for the rest of the line.

Comments help make a program understandable. A sample use of a comment is shown below:

   ; Rule for paragraphs invoked after the start of a chapter
   ELEMENT par WHEN  chapter-start
      OUTPUT "\firstpar{}%sc%n"

1.6.6 Quoted Strings

Many OmniMark constructs include programmer-specified sequences of characters (called strings). Strings in an OmniMark program are surrounded by single or double quotation marks.

Quoted strings can be concatenated with underscore characters ("_"). These concatenated strings are treated by OmniMark as if they were a single quoted string. This is useful when:

The maximum length of a quoted string (after concatenation) in OmniMark is 2048 characters.

Examples of strings are:

Example A

   "Hello world"

Example B

   'This is one very long string that contains a'
     _" ' character and a"
     _' " character.'

More complex string expressions are described in Section 9.2, "String Expressions".

1.6.7 Numbers

Numbers in OmniMark are sequences of digits with an optional preceding minus sign ("-"). (The minus sign and the hyphen are the same character.) Examples of numbers are:

Example A

   0

Example B

   1234

Example C

   -9871234

The maximum absolute value of an OmniMark number is 2,147,483,647. (Commas are added here for clarity. They are not permitted in numbers in OmniMark programs.)

More complex numeric expressions are described in Section 9.1, "Numeric Expressions".


1.7 Capitalization

In most situations, OmniMark does not distinguish between the use of upper-case letters or the corresponding lower-case letters in names or keywords. In some cases, such distinctions are significant.

In particular, case is never significant in OmniMark keywords such as ELEMENT. Thus, the following are all equivalent:

Example A

   ELEMENT list

Example B

   element list

Example C

   Element list

Similarly, case is never significant in the names of OmniMark objects. All of the following examples set the same variable (first-chapter):

Example A

   SET First-Chapter TO TRUE

Example B

   SET FIRST-CHAPTER TO TRUE

Example C

   SET first-chapter TO TRUE

Under programmer control, case can be significant in SGML names, name tokens, and number tokens that appear in a program. This determination is independent of the naming rules in the relevant SGML Declaration. Chapter 19, "Customizing OmniMark Behaviour" discusses how the programmer sets the significance of capitalization in SGML names.

Most OmniMark programmers will find that lower-case names are easier to enter and that OmniMark programs mostly in lower-case are easier to read.

To clarify the distinction between OmniMark keywords and programmer-defined names in the examples in this manual, OmniMark keywords are usually shown in upper-case and programmer-defined names are usually in lower-case. This stylistic convention is intended to remind the reader of the distinction and thus serve as a learning aid.


1.8 Legend

1.8.1 Prose Formatting

In this document, the following formatting is used to distinguish the specific kinds of things from the surrounding prose:

OmniMark keywords are usually presented in upper-case to distinguish them from examples of programmer-defined names.

1.8.2 Formatting of Syntax Descriptions

This document introduces language constructs with a syntax description. This section describes the conventions used in syntax descriptions.

In the following syntax:

Syntax

   REFERENTS (KEY | ^) string-expression

In addition to the above conventions, repetition and optionality is shown in a syntax description with the following characters:

Next chapter is Chapter 2, "Types of OmniMark Programs".

Copyright © OmniMark Technologies Corporation, 1988-1997. All rights reserved.
EUM27, release 2, 1997/04/11.

Home Copyright Information Website Feedback Site Map Search