HOME | COMPANY | SOFTWARE | DOCUMENTATION | EDUCATION & TRAINING | SALES & SERVICE | |
"The Official Guide to Programming with OmniMark" |
|
International Edition |
Next chapter is Chapter 2, "Types of OmniMark Programs".
OmniMark provides a powerful programming environment for the serious professional programmer who needs a language tuned to the needs of both text-based and binary stream processing. OmniMark combines:
Applications for OmniMark extend throughout the spectrum of text-processing, including, but not limited to:
OmniMark integrates a fully ISO 8879 conformant SGML parser with an English-like rule-based programming language. This affords OmniMark programmers with a number of choices for processing a document:
The document you are reading is written and maintained in SGML. All of the indexing, cross-referencing, formatting, and hypertext linking is automatically generated (or configured) by an OmniMark program. Other formats of this document are produced by other OmniMark programs.
To read and understand this manual, an OmniMark user is not assumed to have previous experience with OmniMark and need not be a software engineer. To implement simple OmniMark applications, and to understand this document, the reader needs no more programming skills than those used in writing SGML Document Type Definitions, writing formatter macros, or working with the fourth-generation computer languages that many interactive database management systems provide. Complex translations, however, are better attempted by experienced software developers.
Many parts of this manual assume familiarity with SGML.
OmniMark is a rule-based language. An OmniMark program primarily consists of an optional translation type and one or more rules.
The translation type is a statement that determines what kind of rules are permitted in the program, which subsystems of OmniMark will be active, and automatically sets up the interaction of these subsystems. These are described in Chapter 2, "Types of OmniMark Programs".
A rule is a part of the program which executes when a particular event occurs in the input document. A rule consists of a rule header, an optional condition and zero or more actions.
A rule header describes the event that causes the rule to execute.
A condition is a phrase applied to a rule or action and contains a test which evaluates to either true or false. Conditions control whether the rule or action can be activated. If the test in the condition evaluates to false, the rule or action is not performed.
An action is a statement that OmniMark executes. Actions usually change the state of the program.
In addition to the translation type and rules, an OmniMark program can also contain declarations, functions, and macros.
Declarations are used for declaring variables and for modifying the behaviour of the OmniMark program.
Functions are used to encapsulate a set of actions that perform a specific task or calculate a result. Instead of repeating the code every time the task or calculation needs to be done, the code only needs to be in one place, in the function. The function is then called whenever that activity needs to be done.
Macros define parameterized text substitutions. They can be used to encapsulate a sequence of code that needs to be placed at more than one location in the program. By placing the code in a macro, the programmer can invoke the macro instead of repeating the code. Unlike a function, a macro can contain anything which may appear in an OmniMark program.
OmniMark is a free-format, English-like language. Keywords are used to introduce OmniMark rules, declarations, actions, conditions, function definitions, and macro definitions.
Rules, declarations, and function definitions are all terminated when a keyword introducing a new rule, declaration, function definition, or macro definition is encountered. Actions and conditions are terminated by the start of another action, rule, declaration, function definition, or macro definition.
Only macro definitions have explicit terminators. This is because a macro defines a very general text substitution. The text to be substituted can contain anything at all, including whole rules, or declarations, or even definitions of other macros. Because anything is permitted inside a macro, macros must be explicitly terminated with the MACRO-END keyword.
Declarations begin with a keyword that describes its purpose, or with the word DECLARE. They are used to customize aspects of OmniMark's behaviour.
Some example declarations are:
Example A
DECLARE DATA-LETTERS "%142#%143#" "EE"
Example B
ESCAPE "!"
Example C
GLOBAL COUNTER number-of-columns
Variable declarations are described in Section 6.1, "Introduction to Variable Declarations" and Section 7.1, "Shelf Declarations". Other declarations are described in Chapter 19, "Customizing OmniMark Behaviour".
A rule begins with a keyword that identifies the event that triggers it. ELEMENT rules recognize SGML elements. FIND rules recognize text patterns. PROCESSING-INSTRUCTION rules recognize SGML processing instructions.
Sample rule headers include:
Example A
FIND-START
Example B
FIND "<" LOOKAHEAD ( LETTER | DIGIT )
Example C
DATA-CONTENT
Example D
BREAK-WIDTH 72 80
Different types of rules are described in different chapters depending on their function:
Function definitions always begin with the keyword DEFINE, followed by the keyword EXTERNAL if the function is an external function, an optional return type, and then the keyword FUNCTION.
Functions can either use the conventional comma-and-parenthesis style for delimiting arguments, or they can use a more expressive form that uses name tokens to herald the arguments.
Some example function definition headers are:
Example A
DEFINE COUNTER FUNCTION raise VALUE COUNTER base to VALUE COUNTER exponent AS
Example B
DEFINE COUNTER FUNCTION pow (VALUE COUNTER base, VALUE COUNTER exponent) AS
Functions are described in Chapter 12, "Functions".
Macro definitions begin with the keyword MACRO and end with the keyword MACRO-END. An example of a macro definition is:
MACRO sgml-name IS (LETTER [LETTER | DIGIT | "-."]*) MACRO-END
Macros are described in Chapter 20, "Macros".
Actions generally begin with a verb. INCREMENT, OPEN, and SET are all examples of actions. DO and REPEAT are examples of compound actions, that enclose a sequence of actions up to the terminating DONE or AGAIN respectively.
Some example actions are:
Example A
INCREMENT i
Example B
OPEN out AS FILE "myprog.out"
Example C
CLEAR x
Actions are described throughout this manual, according to their purpose.
Conditions consist of WHEN or UNLESS followed by a test expression. (Test expressions are described in Section 9.3, "Test Expressions".)
Conditions can appear in several places:
This allows the programmer to use information gathered in an initial part of the pattern to determine whether the pattern should continue.
The following is an example of a condition:
WHEN ATTRIBUTE docno IS SPECIFIED OR (chapno > 1) AND xref ISNT OPEN
At the lowest level, OmniMark programs are ultimately composed of keywords, names, quoted strings, numbers, punctuation characters, white space, and comments.
White space is used to separate keywords from each other, and can be used to separate punctuation characters from keywords, or from each other. White space is made up of spaces, tabs, carriage returns, and newline characters. (On some systems, the "RETURN" key (or "ENTER" key) inserts newline characters, and on others it inserts both a carriage return and a newline.)
The amount of white space that a programmer uses is a stylistic decision, and generally has no effect on the behaviour of the OmniMark program. Thus, the following examples are equivalent:
Example A
ELEMENT (blist | nlist)
Example B
ELEMENT(blist | nlist)
There is one situation in which white space is needed around punctuation. Hyphen characters can either be used within OmniMark names, or alone as a minus sign. Programmers must always be careful to separate the hyphen from any preceding name with white space, if the programmer wishes to use it as a minus sign.
a-b
a-b is considered to be a single name. If white space is included around the hyphen characters:
a - b
OmniMark interprets the hyphen character as a minus sign and the expression represents the difference between the counter a and the counter b. For this reason, it is probably wise to adopt the habit of surrounding all operators with spaces.
OmniMark programs are easier to read if they are formatted in a consistent manner. In examples in this manual, actions are indented and rules are usually separated by blank lines. Programmers are encouraged to establish conventions they find convenient.
Keywords are words that have a special meaning to OmniMark. Keywords are never case sensitive.
If a programmer defines a variable, or a function with a name that is the same as a keyword, then that word is no longer a keyword. It becomes just a name. In order to keep using a word as a keyword when it is also being used as the name of a variable or function, the word must be immediately preceded with a backquote ("`"). (This is discussed in greater detail in Section 21.1.2.2, "Backquoting Names"). In general, it is considered bad practise to give a function or a variable a name that is also an OmniMark keyword.
SGML names differ from programmer-defined names in this regard. Because the names of SGML objects are always heralded by a keyword like ELEMENT or ATTRIBUTE, there is no ambiguity when such an object has a name which is also a keyword. Context will determine the meaning of the word.
Names in OmniMark may refer to SGML constructs (such as elements, entities, attributes, and notations) or to OmniMark objects (such as variables, functions, and macros).
Generally, names are tokens beginning with a letter and containing letters, digits, hyphens ("-"), underscores ("_"), or periods ("."). (Because a hyphen can be interpreted as part of a name, the programmer must surround the character with spaces when they want the hyphen interpreted as a minus sign.)
When heralded, a quoted strings may also be used as a name. Quoted names allow names to contain characters that normally would not be permitted. This can be useful when a programmer is generating an OmniMark program from another source.
Heralding is the practice of preceding a variable name with a keyword indicating its type. (Heralding names is described in Section 6.1.4, "Use of Heralds" and Section 21.1, "The Reduced Language".)
Names of SGML objects must always be heralded because they are not declared in the OmniMark program. OmniMark variables, such as pattern variables, switches, counters, and streams may be heralded or not, as the programmer chooses. If they are not heralded, then quoted strings may not be used for their names.
Because of the heralding, OmniMark can always determine the type of an SGML object from the way its name is used, and so there is no problem with having different kinds of SGML objects with the same name. For instance, an element can have the same name as an attribute.
Different OmniMark variables may only have the same name if they are declared in different scopes. This is discussed further in Chapter 8, "Variable Scopes". Once a function is declared, its name becomes unavailable as a name for other objects.
OmniMark puts no limit on the length of a name other than the 2,048 character limit on the length of a string. However, names of SGML constructs will only be processed correctly if they conform to names in the relevant concrete syntax.
Punctuation characters usually occur:
The hyphen ("-") can occur both as the minus operator ("-") and within names. Because of this, it is strongly recommended that the hyphen be preceded by white space whenever it is used as the minus operator.
The OmniMark programmer can annotate a program with descriptive text (called a comment) that is ignored by OmniMark. Comments can appear in an OmniMark program anywhere white space is allowed; that is, anywhere except within a name, keyword, or string.
Comments begin with a semicolon and continue for the rest of the line.
Comments help make a program understandable. A sample use of a comment is shown below:
; Rule for paragraphs invoked after the start of a chapter ELEMENT par WHEN chapter-start OUTPUT "\firstpar{}%sc%n"
Many OmniMark constructs include programmer-specified sequences of characters (called strings). Strings in an OmniMark program are surrounded by single or double quotation marks.
Quoted strings can be concatenated with underscore characters ("_"). These concatenated strings are treated by OmniMark as if they were a single quoted string. This is useful when:
The maximum length of a quoted string (after concatenation) in OmniMark is 2048 characters.
Examples of strings are:
Example A
"Hello world"
Example B
'This is one very long string that contains a' _" ' character and a" _' " character.'
More complex string expressions are described in Section 9.2, "String Expressions".
Numbers in OmniMark are sequences of digits with an optional preceding minus sign ("-"). (The minus sign and the hyphen are the same character.) Examples of numbers are:
Example A
0
Example B
1234
Example C
-9871234
The maximum absolute value of an OmniMark number is 2,147,483,647. (Commas are added here for clarity. They are not permitted in numbers in OmniMark programs.)
More complex numeric expressions are described in Section 9.1, "Numeric Expressions".
In most situations, OmniMark does not distinguish between the use of upper-case letters or the corresponding lower-case letters in names or keywords. In some cases, such distinctions are significant.
In particular, case is never significant in OmniMark keywords such as ELEMENT. Thus, the following are all equivalent:
Example A
ELEMENT list
Example B
element list
Example C
Element list
Similarly, case is never significant in the names of OmniMark objects. All of the following examples set the same variable (first-chapter):
Example A
SET First-Chapter TO TRUE
Example B
SET FIRST-CHAPTER TO TRUE
Example C
SET first-chapter TO TRUE
Under programmer control, case can be significant in SGML names, name tokens, and number tokens that appear in a program. This determination is independent of the naming rules in the relevant SGML Declaration. Chapter 19, "Customizing OmniMark Behaviour" discusses how the programmer sets the significance of capitalization in SGML names.
Most OmniMark programmers will find that lower-case names are easier to enter and that OmniMark programs mostly in lower-case are easier to read.
To clarify the distinction between OmniMark keywords and programmer-defined names in the examples in this manual, OmniMark keywords are usually shown in upper-case and programmer-defined names are usually in lower-case. This stylistic convention is intended to remind the reader of the distinction and thus serve as a learning aid.
In this document, the following formatting is used to distinguish the specific kinds of things from the surrounding prose:
OmniMark keywords are usually presented in upper-case to distinguish them from examples of programmer-defined names.
This document introduces language constructs with a syntax description. This section describes the conventions used in syntax descriptions.
In the following syntax:
Syntax
REFERENTS (KEY | ^) string-expression
In addition to the above conventions, repetition and optionality is shown in a syntax description with the following characters:
Syntax
DEFINE FUNCTION function-name function-argument*
Syntax
text-expression (= text-expression)+
Syntax
ITEM OF shelf-type? shelf-name indexer?
Next chapter is Chapter 2, "Types of OmniMark Programs".
Copyright © OmniMark Technologies Corporation, 1988-1997. All rights reserved.
EUM27, release 2, 1997/04/11.