OmniMark V3 Quick Reference Guide


"The Official Guide to Programming with OmniMark"

OmniMark V3 Quick Reference Guide

Disclaimer

Information in this document is subject to change without notice and does not represent a commitment on the part of OmniMark Technologies Corporation. OmniMark Technologies Corporation makes no representation or warranty, express or implied with respect to this publication. In no event shall OmniMark Technologies Corporation, its employees or contractors be liable for specific, indirect, or consequential damages.

Copyright

All rights reserved by OmniMark Technologies Corporation. This material contains protected properties of OmniMark Technologies Corporation. No part of this document may be reproduced, translated or transmitted in any form or by any means, electronic, mechanical, or otherwise, including but not limited to, photocopying and recording, without express written permission from OmniMark Technologies Corporation.

Copyright © OmniMark Technologies Corporation, 1988 - 1997. All rights reserved.

OmniMark and the Swirl logo are registered trademarks of OmniMark Technologies Corporation. All rights reserved. Trademarks other than those owned by OmniMark Technologies Corporation used in this document belong to their registered owners and are only represented here for editorial reasons.

1.0	The OmniMark^® Programming Model
2.0	OmniMark^® Quick Reference
3.0	Declarations
4.0	Rules
5.0	Actions
6.0	Elements, Attributes, and Data-Attributes
7.0	Conditions
8.0	Patterns
9.0	String Expressions
10.0	Numeric Expressions
11.0	Miscellaneous Components
12.0	Functions
13.0	Macros

1.0 The OmniMark^® Programming Model

Overview

OmniMark has been designed to be as easy to learn and use as possible. This is accomplished in part by providing you with a model of document processing which is intuitive but not limiting.

Many programming problems, such as event, hierarchy, and memory management, are handled automatically by OmniMark, freeing you to concentrate on the task you are trying to accomplish. At the same time, OmniMark does not do so many things for you automatically that it begins to get in the way. OmniMark works with you by providing a simple, straight-forward operating model and a powerful problem solving programming environment.

OmniMark is a data driven language. That is, events that occur in the data being read by OmniMark cause a rule to be selected. Two fundamental types of data events are recognized by OmniMark: events in SGML documents and patterns in non-SGML documents. Common examples of events in SGML documents include the start of an element or an entity reference. In other documents, an event is finding a unique pattern of bits and characters in the document.

Declarations and rules make up OmniMark programs. Declarations affect overall processing. Rules are triggered by various events, and contain actions that specify which calculations and operations to carry out. Both individual actions and rules may have conditions attached to them. Such a rule or action is carried out only if its condition is true.

Translation Types

With Version 3, OmniMark now provides a traditional 3GL (third generation language) mode, as well as a 4GL.

As a 4GL, OmniMark performs four different types of document conversions: down-translation, up-translation, context-translation, and cross-translation. These are called aided translation types .

In the following illustrations, shaded line boxes indicate the OmniMark functions not participating in that particular type of translation.

Down-Translation

A down-translation is a translation whose input is an SGML document. The output of a down-translation could be the input of a text formatter or the input to a database. The output could also be another SGML document either a cleaned up version of the input, or some restructuring of it.

Up-Translation

An up-translation is a translation whose output is a complete SGML document or a document instance. If the output does not contain a DTD, OmniMark must be told where to find it. In either case, the generated document instance is parsed with respect to the specified DTD and any errors are reported. OmniMark places no restrictions on the format of the input to an up-translation; most often the input is a data file compatible with a non-SGML text processing system.

Context-Translation

A context-translation is a translation that converts text from a non-SGML representation of a structured document to another form (often SGML), using SGML as an intermediate form. Patterns in the original document suggest its structure and allow conversion to SGML. OmniMark parses the SGML form and, using the parser, corrects structure errors. The final output makes use of the structure discovered by the parser to produce a fully marked-up document, a minimized document, or some non-SGML form.

Cross-Translation

A cross-translation is a translation that converts a document from one arbitrary form to another. Cross-translations do not make any use of the parser.

3GL Programs

When no translation type is specified, OmniMark runs as a program . PROCESS rules explicitly process files from the command-line (for example). SUBMIT actions are used to emulate cross-translations. SGML-PARSE actions are used to emulate the other three aided translation types.

2.0 OmniMark^® Quick Reference

Overview

This guide provides a concise overview of the OmniMark Programming Language. Its purpose is to show the organization of the language. Consult the OmniMark Programmer's Guide for further details on how to write OmniMark programs.

Notation

Much of this guide consists of lists showing OmniMark syntax and then a description of that syntax. When looking at the syntax of a statement, remember that:

Keywords appear in capital letters in this guide.
Italicized terms are either parameters or refer to OmniMark constructs. If the italicized term is not described where it is used, check the table at the end of this section. This table describes the term or indicates where it is described.
Where a single term or keyword is optional, it appears underlined.

Grouping is denoted in two different forms. In the first form, one or more terms appear between braces ("{" and "}"), and support the following repetition factors:

A question mark ("?") after the braces indicates that the terms are optional.
An asterisk ("*") indicates that they are optional and may be repeated.
A plus sign ("+") indicates that they are required and may be repeated.

Options within a group are separated by vertical bars ("|"). If the specific order of two or more options doesn't matter, they are separated with an ampersand ("&"). Parentheses ("(" and ")") denote themselves.

OmniMark comments start with a semi-colon (";") anywhere on a line, except inside strings, and continue to the end of the line.

Term Description

action See Actions.

aided translation types See Translation Types.

alias A name that stands in place of the real name of an element or attribute. Used in iterations as a convenient way of referring to the current item.

ancestry-qualifier See Ancestry Qualifiers.

attribute Name of an attribute as it appears in the document type definition (DTD).

attribute-identifier See Attribute Identifiers.

character A single character (or text representing a specific character) within quotation marks. For example, "a", "#", and "%n" are characters.

comparer See Comparisons.

condition See Conditions.

counter Name of a counter. See Shelf Declarations.

data-attribute Name of a data-attribute as it appears in the DTD.

data-attribute-identifier See Attribute Identifiers.

entity Name of an entity as it appears in the DTD.

gi The generic identifier (name) of an element as it appears in the DTD.

group Name of a group as defined in the GROUP declaration (see Other Declarations).

index/item See Shelf Indexes.

insertion See Insertion Points.

item See Shelf Indexes.

local See Local Declarations.

name Text (without quotation marks) specifying a name.

number An integer between -2,147,483,647 and 2,147,483,647.

numeric-expression See Numeric Expressions.

pattern See Patterns.

pattern-primary See Primary Patterns.

program 3GL-like OmniMark programs without an explicit translation-type.

shelf-declaration See Shelf Declarations.

shelf-name Name of a shelf. See Shelf Declarations.

shelf-type See Shelf Declarations.

static-string See Static Strings.

stream Name of a stream. See Special Streams.

string-expression Any of the types of strings defined in String Expressions.

switch Name of a switch. See Shelf Declarations.

translation-type See Aided Translation Types.

The term "scope" is now used to describe parts of OmniMark V3. A scope is a sequence of zero or more actions that may be headed by a sequence of declarations. The declarations are in effect only for the duration of the sequence of actions. Some OmniMark scopes include the actions of a rule, the actions between a "DO" and an "ELSE" or "DONE", between a "REPEAT" and the matching "AGAIN". Scopes can nest.

3.0 Declarations

Aided Translation Types

CONTEXT-TRANSLATE

Convert non-SGML documents from one format to another (including SGML) using an intermediate SGML representation.

CROSS-TRANSLATE

Convert documents from one format to another without using SGML parsing.

DOWN-TRANSLATE

Convert SGML documents to some other format (including SGML). This is the default.

UP-TRANSLATE

Convert non-SGML documents into SGML.

At the Start of the Program

BINARY-INPUT number

Default byte-to-number conversions carried out with the ordering specified by number.

BINARY-OUTPUT number

Default number-to-byte conversions carried out with the ordering specified by number.

ESCAPE character

Use character instead of % as the format item escape character.

DECLARE DATA-LETTERS static-string static-string

Extends the set of characters to be considered alphabetic. Default is the 26 English lower-case letters and their upper-case equivalents. (LETTERS in V2).

DECLARE FUNCTION-LIBRARY string-expression

Specifies the default location of external functions.

DECLARE HERALDED-NAMES

Run programs written for versions previous to V3. There is no unaided translation type; programs default to down-translations. Shelf types must always be specified.

DECLARE #MAIN-OUTPUT HAS referent-status

Referent-status is one of REFERENTS-ALLOWED, REFERENTS-DISPLAYED, or REFERENTS-NOT-ALLOWED.

DECLARE #MAIN-OUTPUT HAS text-mode-status

Text-mode-status is one of BINARY-MODE or TEXT-MODE.

DECLARE NAME-LETTERS static-string static-string

Used to specify how OmniMark should map characters in names to upper-case.

DEFINE function-declaration

See Functions.

NAMECASE ENTITY {YES | NO}

Tells OmniMark compiler to ignore upper/lower-case distinctions for external entity names (YES) or to not ignore these distinctions (NO). NO is the default.

NAMECASE GENERAL {YES | NO}

Tells OmniMark compiler to ignore upper/lower-case distinctions for all other SGML names (YES) or to not ignore these distinctions (NO). YES is the default.

NEWLINE static-string

Use static-string as the newline sequence (%n) instead of the system's newline sequence, and open files as binary files rather than text files. This declaration is deprecated in OmniMark V3, in favor of using the TEXT-MODE and BINARY-MODE stream modifiers where needed.

Other Declarations

BREAK-WIDTH number {TO number }?

Preferred width of OUTPUT stream given by first value, maximum allowed value, if desired, given by the second. The actual line-breaking is controlled by the REPLACEMENT-BREAK and INSERTION-BREAK rules.

DELIMITER static-string

Multi-token attribute values delimited by static-string when printed with the %v format item.

GLOBAL shelf-declaration

Defines types and sizes of shelves (arrays) used throughout the program (shelf-declaration is defined in Shelf Declarations).

GROUP name

Precedes all rules belonging to the named group. Takes effect until another GROUP declaration is given.

INCLUDE static-string

Temporarily suspend processing this file and compile the OmniMark declarations and rules given in the file static-string .

INSERTION-BREAK static-string condition

Lines written to the OUTPUT stream can be broken with static-string in order to meet the width restrictions given in the BREAK-WIDTH declaration. Unlike most other declarations, INSERTION-BREAK can be qualified by a condition , to indicate where in the processing the declaration applies.

LIBRARY static-string static-string

The second string gives the system-specific name of the file associated with the public identifier given in the first string-expression .

MACRO macro-definition

Defines macros used by the OmniMark compiler (see Macros).

REPLACEMENT-BREAK character static-string condition

Lines written to the OUTPUT stream can be broken by replacing character with static-string in order to meet the width restrictions given in the BREAK-WIDTH declaration. Unlike most other declarations, REPLACEMENT-BREAK can be qualified by a condition , to indicate where in the processing the declaration applies.

SYMBOL {static-string }+

Gives the symbolic strings to associate with numeric values as accessed by the %y format item.

4.0 Rules

All OmniMark rules consist of a header, that determines when the rule is selected for processing, an optional set of local declarations, and an optional sequence of actions to be performed when the rule is selected. The output streams available to a rule depend on which "domain" it belongs to: Element Domain, Find Domain and SGML Error Domain.

SGML Processing Rules

The rules in this section belong to the Element Domain unless otherwise indicated. Element Domain rules are allowed in context-translations and down-translations. In addition, the SGML-ERROR rule and EXTERNAL-TEXT-ENTITY rule are allowed in up-translations.

DATA-CONTENT condition local* action+

Used to process strings of data characters within an SGML document. Only one DATA-CONTENT rule may have a successful condition (or no condition) for any segment of PCDATA.

DOCUMENT-END condition local* action*

Performs the specified actions after the end of the SGML document. DOCUMENT-END rules are performed in the order they appear in the OmniMark program. In context-translations, DOCUMENT-END rules are performed after the FIND-END rules.

DOCUMENT-START condition local* action*

Performs the specified actions before the start of the SGML document. DOCUMENT-START rules are performed in the order they appear in the OmniMark program. In context-translations, DOCUMENT-START rules are performed before the FIND-START rules.

DTD-END condition local* action*

Performs the specified actions immediately following the end of the DTD. DTD-END rules are performed in the order they appear in the OmniMark program.

DTD-START condition local* action*

Performs the specified actions as soon as the SGML parser has determined the name of the document element. Usually occurs one character after the end of the element name, as in:

<!DOCTYPE HTMLPLUS -- DTD-START fires at the space after the "S" -- [

ELEMENT {gi | #IMPLIED} condition local* action*

Performs the specified actions when the specified generic identifier occurs in the SGML document. Only one ELEMENT rule may have a successful condition (or no condition) for any element's start-tag. The #IMPLIED form of these rules is selected when no rule that explicitly lists generic identifiers or entity names can be selected and any condition is met.

EPILOG-START condition local* action*

Performs the specified actions immediately following the end tag of the document element in the document instance, or, if the document element's end tag is omitted, at the end of the document prior to performing any DOCUMENT-END rule. EPILOG-START rules are performed in the order they appear in the OmniMark program.

EXTERNAL-DATA-ENTITY {entity | #IMPLIED} condition local* action*

Performs the specified actions when the specified external entity occurs in the SGML document. Only one EXTERNAL-DATA-ENTITY rule may have a successful condition (or no condition) for any reference to an external data entity. (EXTERNAL-ENTITY is a synonym for EXTERNAL-DATA-ENTITY.)

EXTERNAL-TEXT-ENTITY {#CAPACITY | #CHARSET | #DOCUMENT | #DTD | #SYNTAX | #IMPLIED | entity} condition local* action*

Performs the specified actions when an SGML entity occurs that matches the specified type or name. Output in the EXTERNAL-TEXT-ENTITY is done as part of the Find domain because its output is input to the SGML parser.

INVALID-DATA condition local* action*

Performs the specified actions when the SGML parser detects text that it cannot coerce into a valid form.

MARKED-SECTION marked-section-type condition local* action*

Performs the specified actions when a marked section of the specified type is encountered. Valid values for marked-section-type are CDATA, IGNORE, INCLUDE-END, INCLUDE-START, and RCDATA.

PROCESSING-INSTRUCTION pattern condition local* action*

Performs the specified actions when a processing instruction encountered in the SGML document matches the given pattern. The first rule found in the OmniMark program is used; other ones that might also match are ignored.

PROLOG-END condition local* action*

Performs the specified actions immediately prior to the start tag of the document element in the document instance, or, if the document element's start tag is omitted, immediately prior to the start tag, text, marked section declaration or USEMAP declaration that starts the instance. PROLOG-END rules are performed in the order they appear in the OmniMark program.

PROLOG-IN-ERROR condition local* action*

This rule fires at the end of an SGML Document prolog (SGML Declaration, if given, and DTD) that contains errors. More than one can fire. OmniMark halts by default after the last one fires.

SGML-COMMENT condition local* action*

Performs the specified actions when an SGML comment occurs in the document instance.

SGML-DECLARATION-END condition local* action*

Performs the specified actions after the end of the SGML Declaration. SGML-DECLARATION-END rules are performed in the order they appear in the OmniMark program.

SGML-ERROR condition LOG local* action*

Performs the specified actions whenever an error detected by the SGML parser satisfies the condition, if given. If the LOG keyword appears, the error message as provided to OmniMark is written to the log stream. In addition, the condition and actions can access parts of this message by examining several pre-defined pattern variables.

TRANSLATE pattern condition local* action*

Performs the specified actions whenever text from the SGML document matches the given pattern. The first rule found in the OmniMark program is used; other ones that might also match are ignored.

Input Processing

Input Processing rules are used in context-translations and up-translations to preprocess data prior to it being fed to the SGML parser. Input Processing rules are also used in cross-translations, in which case they do all the processing of the input data.

FIND pattern condition local* action*

Performs the specified actions whenever text from the input matches the given pattern. The first rule is used; other ones that might also match are ignored.

FIND-END condition local* action*

Performs the specified actions after the end of the input document has been processed. FIND-END rules are performed in the order they appear in the OmniMark program. In context-translations, FIND-END rules are performed before the DOCUMENT-END rules.

FIND-START condition local* action*

Performs the specified actions before the start of the input document. FIND-START rules are performed in the order they appear in the OmniMark program. In context-translations, FIND-START rules are performed after the DOCUMENT-START rules.

Process Program Rules

Programs that do not start with an aided translation type or "DECLARE HERALDED NAMES" are process program s. Input file processing and SGML processing must be started explicitly with the following rules:

PROCESS-END condition local* action*

Performs the specified actions after all other rules have fired. PROCESS-END rules are performed in the order they appear in the OmniMark program. PROCESS-END rules are allowed in all aided translation types as well as programs .

PROCESS-START condition local* action*

Performs the specified actions before any other rules fire. PROCESS rules are performed in the order they appear in the OmniMark program. PROCESS-START rules are allowed in all aided translation types as well as programs .

PROCESS condition local* action*

Performs the specified actions after all PROCESS-START rules have been performed. PROCESS rules are performed in the order they appear in the OmniMark program. PROCESS rules are not allowed when an aided translation type is specified.

5.0 Actions

Any action may be guarded by placing a condition after it. Although the action appears before the condition, the condition is tested first. The action is then carried out only if the condition is true.

Basic Actions

ACTIVATE switch index

Turns the indicated switch on.

CLEAR shelf-type shelf-name

Removes all items from a shelf.

CLOSE stream index

Closes the indicated stream.

COPY shelf-type shelf-name TO shelf-name

Copies the entire contents of a shelf to another shelf.

COPY-CLEAR shelf-type shelf-name TO shelf-name

Moves the entire contents of a shelf to another shelf, effectively clearing the first shelf.

DEACTIVATE switch index

Turns the indicated switch off.

DECREMENT counter index {BY numeric-expression }?

Decreases the value of the indicated counter by the value of numeric-expression , or by one (1) if no numeric expression is included.

DISCARD stream index

Closes and removes the contents of the indicated stream.

EXIT

Terminates the currently nested level of a REPEAT or a REPEAT SCAN.

FAIL

Returns FALSE from SWITCH functions.

HALT {WITH numeric-expression }?

Stops processing under program control.

HALT-EVERYTHING {WITH numeric-expression }?

Stops processing under program control. Referents are not expanded.

INCREMENT counter index {BY numeric-expression }?

Increases the value of the indicated counter by the value of numeric-expression , or by one (1) if no numeric expression is specified.

NEW shelf-type shelf-name {KEY string-expression }? insertion

Adds a new value to the shelf, initializing the new value according to its type: counters are set to one, switches are deactivated, and streams unopened. If no insertion is specified, the item is added to the right side of the shelf.

NEXT GROUP IS {group | #IMPLIED}

When group is specified, that group of rules becomes active; any rules in the #IMPLIED group are still active. When #IMPLIED is specified, all groups of rules except the #IMPLIED group become inactive.

OPEN stream index stream-modifiers AS {BUFFER | REFERENT string-expression | FILE string-expression | function call }

If a function call is given, it must be an EXTERNAL OUTPUT function.

OUTPUT stream-expression

Writes text to the active stream or streams.

OUTPUT-TO stream index stream-modifiers

Redefines the active streams in the FIND domain. The streams listed in the OUTPUT-TO action becomes the only active streams.

PUT stream index stream-modifiers string-expression

Puts the value of string-expression into stream .

REMOVE shelf-type shelf-name index

Removes one item from the indicated shelf.

REMOVE KEY OF shelf-type shelf-name index

Removes the key of the indicated shelf.

REOPEN stream index stream-modifiers {AS {BUFFER | REFERENT string-expression | FILE string-expression | function call }}?

Allows additional text to be written to the end of a buffer, referent, file, or EXTERNAL OUTPUT function, that has been opened and closed.

RESET counter index {TO numeric-expression }?

Sets the value of the indicated counter to numeric-expression , or to one (1) if no numeric expression is included.

RETURN {numeric-expression | string-expression | boolean-expression }?

Return from the current function, with the specified value if it's a value-returning function.

SET ...

See The SET Action.

SGML-IN {string-expression | #NONE}

Specifies how newline sequences written to the SGML stream are to be converted into record-end and record-start characters as used by the SGML parser.

SGML-OUT {string-expression | #NONE}

Specifies how record-end characters emitted by the SGML parser are to be transformed for the element rules.

SGML-PARSE

See The SGML-PARSE Action.

SGML-OUT {string-expression | #NONE}

Specifies how record-end characters emitted by the SGML parser are to be transformed for the element rules.

SUBMIT FILE string-expression

Suspends processing of the current input file while applying the FIND rules to the text of string-expression . If FILE is specified, string-expression is treated as a file name and that file's text is submitted to the FIND rules.

SUCCEED

Returns TRUE from SWITCH functions.

SUPPRESS

Discard the content of an ELEMENT, DATA-CONTENT, MARKED-SECTION or SGML-COMMENT.

TEST-SYSTEM {WITH string-expression }?

Runs internal system-integrity tests.

The SET Action

The SET action is OmniMark's general assignment statement. It has the following forms:

SET {BUFFER | STREAM}? stream index stream-modifiers TO string-expression

Initializes the contents of a buffer and associates the buffer with the indicated stream.

SET COUNTER counter index TO numeric-expression

Initializes the counter to the specified value.

SET function-call stream-modifiers TO string-expression2

Initializes the target indicated by the call to an EXTERNAL OUTPUT function.

SET EXTERNAL-FUNCTION function-name TO string-expression

Specifies the external name that function-name is defined as.

SET FILE string-expression1 stream-modifiers TO string-expression2

Opens an unnamed stream, attaches the stream to a file named string-expression1 , writes the text in string-expression2 to the stream, and then closes the stream.

SET FUNCTION[ndash ]LIBRARY OF EXTERNAL[ndash ]FUNCTION function[ndash ]name TO string[ndash ]expression

Specifies the library where function-name can be found.

SET KEY OF shelf-type shelf-name index TO string-expression

Sets the key of the indicated shelf item to the specified value.

SET NEW shelf-type shelf-name index {KEY string-expression }? insertion stream-modifiers TO string-expression

Creates a new item on the shelf, with an optional key, and initializes its value. Modifiers are allowed only for 'SET NEW STREAM' ('SET NEW BUFFER' is a synonym). Sets the key of the indicated shelf item to the specified value. If no insertion is specified, the item is added to the right side of the shelf.

SET REFERENT string-expression1 stream-modifiers TO string-expression2

Opens an unnamed stream, attaches the stream to a referent named string-expression1 , writes the text in string-expression2 to the stream, and then closes the stream.

SET SWITCH switch index TO boolean-expression

Initializes the switch to the specified value.

SET THIS REFERENT TO string-expression

Initializes the current referent's value in a 'REPEAT OVER REFERENTS' action.

The SGML-PARSE Action

This action starts processing another document through the SGML parser, suspends current activity at the point of the "%c". Its form is:

DO SGML-PARSE document-type SCAN source-type
    local*
    action+ ; includes a "%c"
DONE

document-type is one of:

DOCUMENT {CREATING DTDS KEY string-expression}?

A complete SGML document, with its own SGML Declaration.

INSTANCE {WITH DOCUMENT-ELEMENT string-expression}? WITH {CURRENT DTD | DTDS KEY string-expression}

An instance, or part of an instance with outer element given by the named DOCUMENT-ELEMENT.

SUBDOCUMENT {CREATING DTDS KEY string-expression}?

An SGML document that uses the current SGML Declaration.

and source-type is one of:

INPUT input function call

input function call is an internal void function. Its OUTPUT actions write to the SGML parser.

string-expression

Feeds string-expression to the SGML parser.

Control Structures

Control structures provide ways of grouping several actions together.

Standard Control Structures

DO WHEN condition local* action* {ELSE WHEN condition local* action*}* {ELSE local* action*}? DONE

In the DO WHEN-ELSE control structure, actions are performed when the condition is met. This control structure is analogous to the IF-THEN-ELSE construct used in some programming languages.

DO SELECT numeric-expression {CASE constant numeric-expression local* action*}* {ELSE local* action*}? DONE

Similar to DO-WHEN-ELSE: executes faster, but all the CASE values must be constant.

DO local* action* DONE

This second, simpler form of the DO control structure, just groups several actions together. Its most common use is to let a using-prefix apply to more than one action.

REPEAT local* action+ AGAIN

The actions are repeated in order until an EXIT action is met. One of the actions must be or contain an EXIT action.

Iterating Control Structures

The actions within a REPEAT OVER structure are repeated once for each item on the shelf. The #FIRST, #LAST, and #ITEM pre-defined values can be used inside REPEAT OVER actions.

REPEAT OVER shelf-type shelf-name local* action* AGAIN

Iterate through shelves of counters, switches or streams.

REPEAT OVER REFERENTS local* action* AGAIN

Iterate through the defined and used referents.

REPEAT OVER ATTRIBUTE attribute ancestry-qualifier local* action* AGAIN

Iterates through the tokens of a list-valued element attribute (NAMES, NUMBERS, NMTOKENS, NUTOKENS, ENTITIES or IDREFS).

REPEAT OVER DATA-ATTRIBUTE data-attribute {OF (attribute-identifier)} local* action* AGAIN

Iterates through the tokens of a list-valued data-attribute (NAMES, NUMBERS, NMTOKENS or NUTOKENS).

REPEAT OVER SPECIFIED ATTRIBUTES ancestry-qualifier AS alias local* action* AGAIN

Iterates through the attributes associated with an element. If SPECIFIED appears, the shelf of attributes only contains those specified in the element's start tag.

REPEAT OVER SPECIFIED DATA-ATTRIBUTES {OF (attribute-identifier)}? AS alias local* action* AGAIN

Iterates through the data-attributes associated with an attribute. If SPECIFIED appears, the shelf of data-attributes only contains those specified in the entity's declaration.

REPEAT OVER REVERSED CURRENT ELEMENTS AS alias local* action* AGAIN

Iterates through all currently open elements. When REVERSED is specified, the iteration starts from the last (innermost) opened element. Otherwise, it begins from the first (outermost) opened one.

Text Scanning

The following control structures can be used for analyzing text:

DO SCAN FILE string-expression {MATCH UNANCHORED pattern condition local* action*}+ {ELSE local* action*}? DONE

The DO SCAN construct examines a stream (string) or file using the given patterns and conditions. Each condition and pattern is tested. For the first condition, if any, and pattern that succeeds, the following set of actions is performed. If no conditions and patterns succeed, the set of actions following ELSE (if any) are performed. If UNANCHORED is not specified following MATCH, a pattern must match at the start of the stream or file. If UNANCHORED is specified, a pattern can match anywhere within the stream or file.

REPEAT SCAN FILE string-expression {MATCH UNANCHORED pattern condition local* action*}+ AGAIN

The REPEAT SCAN construct examines a stream or file in the same manner as DO SCAN, except that if a set of actions that follow them are performed, the REPEAT SCAN is repeated starting immediately following the last thing that was matched. REPEAT SCAN terminates when no pattern or conditions succeed.

Text Skipping

These control structures are used to quickly skip over the input while matching it. They can appear in FIND-START and FIND rules.

DO SKIP PAST numeric-expression {OVER string-expression}? local* action* {ELSE local* action*}? DONE

The number of characters indicated in numeric-expression are skipped over; if there are not that many characters left in the input to process, an error message is printed and the system halts. If the OVER string-expression is included, the number of characters indicated in numeric-expression are skipped over before searching for string-expression.

DO SKIP OVER string-expression local* action* {ELSE local* action*}? DONE

Characters are skipped over until string-expression is found. If no string expression is found, the actions under the ELSE are performed, and no characters will remain in the input.

USING Prefixes

A USING prefix selects a shelf item, attribute, attribute token, set of output streams, or set of groups for use within an action. The USING prefix precedes the action (or DO {action }* DONE) to which it applies.

USING shelf-type shelf-name index
USING REFERENTS index
USING ATTRIBUTE attribute ancestry-qualifier item {AS alias }?
USING DATA-ATTRIBUTE data-attribute OF (attribute-identifier ) item {AS alias }?
USING SPECIFIED ATTRIBUTES index AS alias
USING SPECIFIED DATA-ATTRIBUTES index AS alias
USING GROUP group
USING NESTED REFERENTS
USING OUTPUT AS stream index stream-modifiers

6.0 Elements, Attributes, and Data-Attributes

OmniMark provides a set of features for specifying a particular SGML element, attribute, and data-attribute.

Ancestry Qualifiers

Often, an element can be referred to simply by its name. However, when the relationship between an element and another element is important, the element's name can be followed by an ancestry qualifier.

Ancestry qualifiers are used for examining an element's parent, grandparent, etc., and objects associated with them such as attributes and short reference maps. In the following list of ancestry qualifiers, gi refers to the name of an element in the document type definition (often referred to as a "generic identifier"). When ancestry-qualifier appears after an ancestry qualifier, the ancestry qualifier explicitly can be used to qualify another ancestry qualifier. For example,

NAME OF PARENT OF CURRENT ELEMENT an-alias

is the name of the parent of the currently opened element identified by "an-alias".

The descriptions define what is meant by the term after the OF keyword.

OF ANCESTOR gi ancestry-qualifier

The most recently opened element that has the name gi .

OF CURRENT ELEMENT alias

The currently open element referred to by the alias defined in the REPEAT OVER CURRENT ELEMENTS AS alias action.

OF DOCTYPE

The outermost element of the instance. This is useful for accessing the document element without knowing its name.

OF ELEMENT

The most recently opened element.

OF OPEN ELEMENT gi ancestry-qualifier

The most recently opened element that has the name gi . It can be either the element that was just opened or an ancestor of it.

OF PARENT ancestry-qualifier

The parent of the element.

OF PREPARENT gi ancestry-qualifier

An ancestor other than the parent.

Attribute Identifiers

ATTRIBUTE attribute ancestry-qualifier item

Identify an element's attribute by its name.

DATA-ATTRIBUTE data-attribute OF (attribute-identifier ) item

Identify a data-attribute by its name.

SPECIFIED ATTRIBUTES index ancestry-qualifier

Identify an element's attribute by its shelf index. If the keyword SPECIFIED appears, the shelf contains only the attributes specified in the element's start tag.

SPECIFIED DATA-ATTRIBUTES OF (attribute-identifier ) index

Identify a data-attribute by its shelf index. If the keyword SPECIFIED appears, the shelf contains only the data-attributes specified in the entity declaration instance.

{ATTRIBUTE | DATA-ATTRIBUTE} attribute-alias

Identify an attribute by an alias defined by a REPEAT OVER ATTRIBUTES or USING ATTRIBUTES.

7.0 Conditions

Conditions take two forms. The form WHEN condition succeeds when condition is true. The form UNLESS condition succeeds when condition is false.

Conditions are grouped here according to the type of test.

Numeric Tests

numeric-expression {comparer numeric-expression }+: numeric-expression is defined in Numeric Values.

Pattern Tests

PATTERN pattern {IS | ISNT} SPECIFIED

String Tests

string-expression {comparer string-expression }+
string-expression MATCHES UNANCHORED primary-pattern: When UNANCHORED is included, the match is successful if the pattern appears anywhere in the text being scanned, not just at the beginning.

Boolean Tests

Boolean values may be compared:

boolean-value {= boolean-value }*
boolean-value != boolean-value

A boolean value is one of the following:

TRUE
FALSE
SWITCH switch indexer
function call
( boolean expression )

A boolean expression is any general test.

Stream Tests

FILE string-expression EXISTS
FILE string-expression {IS | ISNT} {READABLE | WRITABLE | FILE | DIRECTORY}
REFERENTS index {IS | ISNT} ATTACHED
STREAM stream index HAS NAME
STREAM stream index {IS | ISNT} {ATTACHED | BUFFER | CLOSED | EXTERNAL | FILE | OPEN | REFERENT | SGML-PARSER}
THIS REFERENT {IS | ISNT} ATTACHED

Shelf Tests

SPECIFIED ATTRIBUTES ancestry-qualifier HAS KEY string-expression
SPECIFIED DATA-ATTRIBUTES {OF (attribute-identifier )}? HAS KEY string-expression
DTDS HAS KEY string-expression
REFERENTS HAS KEY string-expression
shelf-type shelf-name HAS KEY string-expression
shelf-type shelf-name index {IS | ISNT} KEYED

Element Tests

ANCESTOR ancestry-qualifier {IS | ISNT}element
CURRENT ELEMENT alias {IS | ISNT} gi
DOCTYPE {IS | ISNT} gi
ELEMENT ancestry-qualifier {IS | ISNT} gi
LAST PROPER SUBELEMENT ancestry-qualifier {IS | ISNT} gi
OPEN ELEMENT ancestry-qualifier {IS | ISNT} gi
PARENT ancestry-qualifier {IS | ISNT} gi
PREPARENT ancestry-qualifier {IS | ISNT} gi
STATUS OF LAST SUBELEMENT {IS | ISNT} {PROPER | INCLUSION}
USEMAP {IS | ISNT} {short-reference-map | #EMPTY | #NONE }: short-reference-map is the name of a short reference map as defined in the document type definition.

Attribute Tests

attribute-identifier {IS | ISNT} attribute-type

where attribute type is one of SPECIFIED, DEFAULTED, IMPLIED, CDATA, ENTITY, ENTITIES, GROUP, ID, IDREF, IDREFS, NAME, NAMES, NMTOKEN, NMTOKENS, NOTATION, NUMBER, NUMBERS, NUTOKEN or NUTOKENS.

data-attribute-identifier {IS | ISNT} attribute-type

where attribute type is one of SPECIFIED, DEFAULTED, IMPLIED, CDATA, GROUP, NAME, NAMES, NMTOKEN, NMTOKENS, NUMBER, NUMBERS, NUTOKEN or NUTOKENS.

Element Content Tests

CONTENT ancestry-qualifier {IS | ISNT} {ANY | CDATA | CONREF | ELEMENT | EMPTY | MIXED | RCDATA}
LAST PROPER CONTENT ancestry-qualifier {IS | ISNT} {gi | #DATA}
PREVIOUS ancestry-qualifier {IS | ISNT} gi

Entity Tests

attribute-identifier {IS | ISNT} {EXTERNAL | PUBLIC | SYSTEM | IN-LIBRARY}
ENTITY {IS | ISNT} {#CAPACITY | CDATA-ENTITY | #CHARSET | #DTD | IN-LIBRARY | NDATA-ENTITY | PUBLIC | SDATA-ENTITY | #SYNTAX | SUBDOC-ENTITY | SYSTEM}
NOTATION {OF attribute-identifier }? {IS | ISNT} {IN-LIBRARY | PUBLIC | SYSTEM}
NOTATION {OF attribute-identifier }? {IS | ISNT} EQUAL notation: where notation is the name of the notation specified in the document type definition.

Comparisons

Common forms are <, <=, =, >=, >, and != (inequality).

Multiple values may be compared, as long as all the values can be given the same type. All comparisons must be either non-ascending (no > or >=) or non-descending. != can only be used to compare exactly two values.

PROCESS
LOCAL COUNTER i INITIAL {3}
LOCAL STREAM s INITIAL {"b"}
LOCAL SWITCH w INITIAL {TRUE}

DO WHEN 1 < i <= "5" = 5 IS EQUAL 5 <= 99
  OUTPUT "I knew that.%n"
DONE
DO WHEN 'a' < s < UL "C"
  OUTPUT "I knew that too.%n"
DONE
DO WHEN true = TRUE = (! FALSE) = w = (3 + 4 = 2 + 5)
  OUTPUT "Tell me something I don't know.%n"
DONE
HALT

8.0 Patterns

Patterns consist of primary patterns, connected together with various operators, qualifiers, and occurrence indicators, as described in the following sub-sections:

Primary Patterns

UL string-expression

Matches the text of string-expression . If UL is not specified, it matches the text exactly. If it is specified, case is ignored.

UL [character-set ] UL [character-set EXCEPT character set ]

Where character-set is:

static-string : any character in static-string .
character-class-name : any character in character-class-name .
character1 TO character2 : any character in the range from character1 to character2 (character1 and character2 must be single-character strings).
character-set1 OR character-set2 : any of the characters in character-set1 or character-set2 .

character-class-name

the character classes can be used by themselves or combined in character-sets :

ANY Matches any character.

ANY-TEXT Matches any character except the character that represents the end of line.

SPACE Matches a single space character.

BLANK Matches either a single space character or a single tab character.

WHITE-SPACE Matches a single space character, tab character, or end of line character.

DIGIT Matches any of the characters "0" through "9".

LETTER Matches any of the 26 upper-case or lower-case letters of the Roman alphabet.

LC Matches any lower-case letter of the Roman alphabet.

UC Matches any upper-case letter of the Roman alphabet.

position-pattern

the position patterns match a point either side of a character:

CONTENT-START Recognizes the beginning of an elements content.

CONTENT-END Recognizes the end of an elements content.

LINE-START Recognizes the beginning of a line.

LINE-END Recognizes the end of a line.

VALUE-START Recognizes the beginning of a scanned value.

VALUE-END Recognizes the end of a scanned value.

WORD-START Recognizes the beginning of a string of letters and digits.

WORD-END Recognizes the end of a string of letters and digits.

UL (pattern condition )

Succeeds only if both the condition is true and pattern succeeds. If UL is specified, it applies only to pattern .

UL (condition )

Succeeds if condition is true. UL has no effect here if specified.

Occurrence Indicators

In the following list P is a primary pattern. Note: In the following only, the braces represent themselves ("{" and "}") rather than the meta-syntax used elsewhere in this document.

P ?

P may occur zero or one times.

P *

P may occur zero or many times.

P +

P must occur one time and may occur many times.

P {numeric-expression }

Succeeds if P is matched exactly numeric-expression times.

P {numeric-expression1 TO numeric-expression2 }

Succeeds if P is matched at least numeric-expression1 times. Input is consumed until either there are no more matches or P has been matched numeric-expression2 times.

P {numeric-expression }+

Succeeds if P is matched at least numeric-expression times. It continues to consume its input until there are no more matches.

Saving Text

P occurrence-indicator = pattern-variable: The text matched by P occurrence-indicator is saved in the pattern variable pattern-variable , and can be accessed by just referencing pattern-variable . The PATTERN keyword can be given before it. The ANOTHER keyword can be used inside a pattern, and is equivalent to the PATTERN keyword. The = is the supported V2 form of indicating pattern assignment.

Matching Entities

By default, patterns in TRANSLATE rules match any kind of text from the SGML document. They can also be used to match the replacement text of internal entities or the names of internal entities or both. For example:

TRANSLATE SDATA NAMED "amp"

Match the expansion of the "&" SDATA entity.

TRANSLATE SDATA VALUED "[amp ]"

Match any SDATA entity whose value is "[amp ]". (The keyword VALUED is optional if only the value is being matched.)

NAMED and VALUED can be used together and the text following NAMED or VALUE replaced by any pattern-primary (so that the name or the value of an entity can be captured). These patterns can also be combined with others in the same TRANSLATE rule. The types of entities are:

CDATA

Match an internal CDATA entity. A CDATA entity contains text that is treated as character data when referenced. It is independent of specific systems, devices, or application processes.

SDATA

Match an SDATA entity. An SDATA entity contains text which is treated as system data when referenced. The text is dependent on a specific system, device, or application process.

ENTITY

Match either a CDATA or SDATA entity.

Text that is not part of a CDATA or SDATA entity can also be matched (the NAMED part is not allowed in these cases):

NON-CDATA

Match text that does not include replacement text of a CDATA entity. The matched text can include all or part of the replacement text for one or more SDATA entities.

PCDATA

Match text that does not include replacement text for CDATA or SDATA entity. PCDATA (parsed character data) is zero or more characters that can be parsed without any markup being found.

NON-SDATA

Match text that does not include replacement text of an SDATA entity. The matched text can include all or part of the replacement text for one or more CDATA entities.

TEXT

Match any text include replacement text for a CDATA or SDATA entities. This is the default.

NAMED and/or VALUED can also be used in the pattern at the start of a PROCESSING-INSTRUCTION rule to capture a processing instruction that is the expansion of a PI entity.

Combining Patterns

P₁ P₂

Matches P₁ then P₂ .

P₁ LOOKAHEAD P₂

Matches P₁ only if it is immediately followed by text that matches P₂ . Text corresponding to P₂ is not "consumed" as part of the match.

P₁ LOOKAHEAD NOT P₂

Matches P₁ only if it is not immediately followed by text that matches P₂ . A pattern can be inserted between LOOKAHEAD and NOT. The following matches P₁ only if it is immediately followed by text that matches P₂ and that text in turn is immediately followed by text that matches P₃ :

P₁ LOOKAHEAD P₂ NOT P₃

P₁ OR P₂

Matches either P₁ or P₂ .

The above list gives the precedence of combining patterns, from highest to lowest. For example, the pattern:

P₁ P₂ LOOKAHEAD P₃ NOT P₄ OR P₅ LOOKAHEAD NOT P₆ P₇

has the following implicit parentheses:

((P₁ P₂) LOOKAHEAD P₃ NOT P₄) OR (P₅ LOOKAHEAD NOT (P₆ P₇))

9.0 String Expressions

OmniMark has three different types of strings: static, dynamic, and those represented by an OmniMark expression. The term static-string refers to only static strings. However, the term string-expression refers to any type of string.

Static Strings

A static string can contain any character (except the newline or carriage-return character) and static format items.

Static Format Items

Static format items are characters that can be evaluated without reference to the values of any shelves or variables. The following are static format items:

%% A percent sign

%_ Alternative form for a space

%n A newline sequence

%t A tab character

%# #

%b r{d1 ,d2 ,...dn } Characters whose values are given by d1 ,d2 ,...dn in base b

%) )

%" "

%' '

%@ Interpolates arguments inside macro expansions

%@% % inside macro expansions (used to dynamically construct format items)

"%" can be replaced using the ESCAPE declaration.

Dynamic String Format Items

Dynamic strings can contain any character or format item allowed in static strings, as well as the following dynamic format items. There are two kinds of dynamic format items: those used to emit the value of a shelf or variable (value-oriented format items), and those that affect the actual processing of output (processing-oriented format items).

Value-Oriented Formats

The modifiers allowed for each format are in square brackets ("[" and "]"). One or more of these may be used, and in any order.

%[fkjluw]a Alpha representation of a counter

%[fnumber ]b Binary representation of a counter

%[hlsuz]c Contents of an element

%[fklrsuznumber ]d Decimal representation of a counter

%[fklu]g Contents of a stream buffer

%[fklu]i Roman numeral representation of a counter

%[fklu]q Name of the current element

%[fklu]q Name of the current entity (in EXTERNAL-DATA-ENTITY and EXTERNAL-TEXT-ENTITY rules)

%[eop]q Public and system identifier information associated with the current entity (in EXTERNAL-DATA-ENTITY and EXTERNAL-TEXT-ENTITY rules). The 'o' modifier cannot be used in EXTERNAL-TEXT-ENTITY rules.

%[fklu]v Contents of an attribute (in general)

%[hlsuz]v Contents of a CDATA element attribute

%[eop]v Contents of an ENTITY or ENTITIES attribute

%[ep]v Contents of a NOTATION attribute

%[fklu]x Contents of a pattern variable

%y Symbolic representation of a counter.

Processing-Oriented Formats

%sn	Strippable newline: emit a newline sequence only if one would not be emitted at the current point
%st	Strippable tab: emit a tab character only if no white-space character would be emitted at the current point
%s_	Strippable space: emit a space only if no space or newline sequence would be emitted at the current point
%/	Make the next character breakable
%[	Stop counting characters towards the preferred width
%]	Restart counting characters again towards the preferred width

List of Modifiers

The system identifier associated with an attribute, entity, or notation, or the replacement text of an internal entity

The system identifier associated with the public identifier according to a LIBRARY declaration

number f

Use a field width that contains at least number characters

Suppress line-breaking

Do not use "i", "l", or "o" in alphabetic lists (military style)

Pad characters to the left of the value

Convert alphabetic characters to their lower-case representations

Access a notation associated with an attribute or external entity

The public identifier associated with an attribute, entity, or notation

number r

Print a number using base number

For "%sc", strip extra white-space from the element's content

For "%sd", remove trailing zeros after the decimal point, and the decimal point, if all digits following it are zeros

Convert alphabetic characters to their upper-case representations

For alphabetic lists, use "aa", "bb", "cc", etc., instead of "aa", "ab", "ac"

For "%c" and "%v", suppress applying TRANSLATE rules to element's data content and to Cdata-attributes

For "%d", pad the value with zeros on the left rather than with spaces

number

For "%d", move the decimal point number places to the left

number

For "%b", use the ordering given by number . number should be one of:

0	First byte contains high-order bits of the counter, last byte contains low-order bits.
1	Same as 0 but with pairs of bytes flipped.
2	Same as 3 but with pairs of bytes flipped.
3	First byte contains low-order bits of the counter, last byte contains high-order bits.

Other String Expressions

ATTRIBUTE attribute ancestry-qualifier item

Returns the value of an attribute.

SPECIFIED ATTRIBUTES ancestry-qualifier item

Returns the value of an attribute on the ATTRIBUTES shelf.

DATA-ATTRIBUTE data-attribute OF (attribute-identifier )

Returns the value of a data-attribute.

SPECIFIED DATA-ATTRIBUTES OF (attribute-identifier ) item

Returns the value of a data-attribute on the DATA-ATTRIBUTES shelf.

DATE string-expression

Uses special date format items to return the string value of date and time information.

COMPILED-DATE string-expression

Similar to DATE, but calculates the info at compile-time. Useful for labelling .csc files.

EXTERNAL-FUNCTION function-name

Returns the external name that function-name is defined as.

FUNCTION-LIBRARY OF EXTERNAL-FUNCTION function-name

Returns the name of the library where function-name can be found.

KEY OF ATTRIBUTE alias

Returns the real name of the attribute associated with the alias name.

KEY OF SPECIFIED ATTRIBUTES ancestry-qualifier item

Returns the name of an opened element's attribute (i.e. its "key" on the ATTRIBUTES shelf).

KEY OF DATA-ATTRIBUTE OF (attribute-identifier ) alias

The real name of the attribute associated with the alias name.

KEY OF SPECIFIED DATA-ATTRIBUTES OF (attribute-identifier ) item

Returns the name of a data-attribute (i.e. its "key" on the DATA-ATTRIBUTES shelf).

KEY OF REFERENTS item

Returns the key of a referent shelf.

KEY OF shelf-type shelf-name index

Returns the key of a stream, counter, or switch shelf.

KEY OF THIS REFERENT

Returns the key of the selected referent. It can be used only within a USING REFERENTS or REPEAT OVER REFERENTS body.

NAME ancestry-qualifier

Returns the name of the element defined by the ancestry qualifier.

NAME OF STREAM stream index

Returns the name of the file or referent associated with the given stream.

STREAM stream index

Returns the text of a stream that was opened as a buffer or referent and then later closed.

PATTERN pattern

Returns the text of the indicated pattern variable.

REFERENT string-expression

Output a value to be determined at some other point of processing.

SILENT-REFERENT string-expression

Like REFERENT, but the named referent doesn't go on the REFERENTS shelf.

function call

Call to a string function.

String Operators

string1 JOIN string2 string1 || string2

This just connects the two strings together.

string REPEATED number string ||* number

This produces string repeated number times.

10.0 Numeric Expressions

Numeric Values

A numeric-value can take the following forms:

BINARY string-expression

Converts a string of characters to a number as if the string were the binary representation of that number.

string-expression BINARY numeric-value

As the one-argument form of BINARY except that the numeric-value specifies the order in which the bytes in the string are to be converted.

string-expression BASE numeric-value

Convert the number represented by string expression to a number using numeric-value as the base. Letters are used to represent digits above 9 for bases 11 up to 36 inclusive. A leading sign is allowed.

CHILDREN ancestry-qualifier

Returns the number of subelements of an element.

COUNTER {counter index | #ITEM}

Returns the current value of the counter

ITEM OF SPECIFIED ATTRIBUTES KEY string-expression ancestry-qualifier

Returns the position on the attributes' shelf of the attribute with key string-expression .

ITEM OF SPECIFIED DATA-ATTRIBUTES KEY string-expression OF (attribute-identifier )

Returns the position on the data-attributes' shelf of the data-attribute with key string-expression .

ITEM OF shelf-type shelf-name KEY string-expression

Returns the position of the item on the shelf with the given key.

LENGTH OF string-expression

Returns the length of the indicated string.

NUMBER OF SPECIFIED ATTRIBUTES ancestry-qualifier

Returns the number of attributes associated with an element. If SPECIFIED appears, only attributes specified in the document instance are counted.

NUMBER OF CURRENT ELEMENTS ancestry-qualifier

Returns the number of currently open elements.

NUMBER OF CURRENT SUBDOCUMENTS ancestry-qualifier

Returns the number of currently open subdocuments invoked by the DO SGML-PARSE SUBDOCUMENT ... compound action.

NUMBER OF SPECIFIED DATA-ATTRIBUTES OF (attribute-identifier )

Returns the number of data-attributes associated with an attribute. If SPECIFIED appears, only data-attributes specified in the document instance are counted.

NUMBER OF shelf-type shelf-name

Returns the number of shelves that currently exist.

NUMBER OF REFERENTS

Returns the number of referents that currently exist.

OCCURRENCE ancestry-qualifier

Returns the number of consecutive occurrences of a currently open element.

SYSTEM-CALL string-expression

Calls another program while OmniMark is running. The parameter string-expression is the command to execute, which typically consists of a program name followed by the program's arguments. The value returned is the value the called program returns to the operating system.

VALUE numeric-expression

V2 holdover for coercing comparisons to numeric expressions. No longer necessary.

number

A base-10 number.

string-expression

Any string-expression found where a numeric-value is expected is interpreted as a base-10 number, with a leading sign allowed.

function call

Call to a numeric function.

Numeric Operators

The following operators are allowed, in the order of highest precedence to lowest:

Unary operators

+, -, bit-wise COMPLEMENT

High binary operators

*, / and MODULO, bit-wise MASK and SHIFT

Low binary operators

+ and -, bit-wise UNION and DIFFERENCE

11.0 Miscellaneous Components

This section defines many of the terms mentioned but not defined in the previous sections.

Local Declarations

Local declarations are in effect for a single scope. Any local declaration can be used at the start of any scope.

LOCAL shelf-declaration

The shelf can be used in any of the actions that follow it in the scope.

SAVE shelf-type shelf-name

The shelf's value is saved at the start of the scope and restored at the end of the scope. The shelf can be used by any rule that is active inside of the current one.

SAVE-CLEAR shelf-type shelf-name

Like SAVE, except that the shelf is, in addition, CLEARed at the start of the scope.

SAVE GROUPS

Save away all the active groups at the start of the scope, and restore them at the scope's end.

Shelf Declarations

Shelves are the OmniMark equivalent of an array. They can be of type COUNTER, STREAM, or SWITCH. A shelf declaration has the following forms:

shelf-type shelf-name VARIABLE {TO number }? size init-values
shelf-type shelf-name SIZE number init-values
shelf-type shelf-name init-values

size has the form

INITIAL-SIZE constant numeric expression.

init-values has the form

INITIAL '{' init-value {, init-value}* '}'

init-value has the form

value {WITH KEY constant string expression}?

value must be constant and of the appropriate type.

Shelf Indexes

Items on a shelf (array) are accessed using an index . OmniMark provides two indexes:

ITEM numeric-value

by position.

KEY string-expression

by key value.

LASTMOST

use the last item of the shelf, even if in a USING block, REPEAT OVER block, or function.

Insertion Points

When a new item on a shelf is created, it can be explicitly placed either before a point in the shelf or after:

BEFORE ITEM n

1 <= n <= shelf-size + 1

AFTER ITEM n

0 <= n <= shelf-size

BEFORE KEY str

The shelf must have KEY str .

AFTER KEY str

Same as putting the shelf BEFORE the item with the given key.

If no insertion point is specified, the new item is placed on the end of the shelf.

Streams

OmniMark provides a single data structure, called a stream, for referring to different files, saving portions of a document for later reference, data movement forward or backward in a file, and manipulating strings of text.

Special Streams

#APPINFO

Provides access to the APPINFO parameter in the SGML Declaration.

#CURRENT-OUTPUT

Refers to the active streams in the domain in which it is referenced (#OUTPUT in V2).

#DOCTYPE

Provides the name of the document element.

#ERROR

Refers to the standard error output.

#LIBPATH

A shelf of streams. Each stream contains a string corresponding to the value of one of the libpath parameters entered on the command line.

#LIBRARY

A shelf of streams. Each stream contains a string corresponding to a string specified in a LIBRARY declaration (See Other Declarations).

#LIBVALUE

A shelf of streams used for matching public identifiers. The key of each stream is the public identifier and the value of the stream is the value of that public identifier.

#MAIN-INPUT

The contents of the filenames specified on the command-line.

#MAIN-OUTPUT

Refers to standard output. The #MAIN-OUTPUT stream is the target of the -of control argument given on the command line. (OUTPUT in V2).

#PROCESS-INPUT

Refers to the standard input. This can only be read.

#PROCESS-OUTPUT

Refers to the standard output independently of where the OUTPUT stream is bound (#CONSOLE in V2).

#SGML

Refers to the stream attached to the input of the SGML parser (SGML in V2).

#SUPPRESS

Anything written to the #SUPPRESS stream is discarded.

Stream Modifiers

The following modifiers may be used when opening streams

WITH BINARY num eric-expression

Specifies the ordering for all %b format items written to the named stream.

WITH BREAK-WIDTH numeric-expression {TO numeric-expression }?

Defines acceptable line widths.

WITH DOMAIN-FREE

Indicates the named stream may be manipulated by the FIND and the ELEMENT domains, and if there is one, the SGML-ERROR domain.

WITH REFERENTS-ALLOWED

Referents may be written to the indicated destination.

WITH REFERENTS-DISPLAYED

Referents may be written to the indicated destination, but only their names will be written to it, not their eventual value.

WITH REFERENTS-NOT-ALLOWED

Referents may not be written to the indicated destination.

WITH "[hlsuz]"

Any combination of these element content modifiers (except "u" and "l" together) may be used to specify how element content should be processed when written to the specified stream.

All of the above modifiers are allowed for OPEN and REOPEN actions. Only the BINARY and element content modifiers are allowed for PUT actions.

Writing to a Stream

OUTPUT string-content

Write string-content to the active streams.

OUTPUT-TO stream

Change active streams to stream .

PUT stream string-content

Write string-content to stream , temporarily making it the active stream.

The #MAIN-OUTPUT stream

The initial active stream ("standard output" or the destination specified with the -of command-line option). ).

The #CURRENT-OUTPUT streams

The currently active streams.

Special Counters

#ITEM

Position in a REPEAT OVER loop.

#SGML-ERROR-COUNT

Number of errors encountered in current sgml-parse.

#SGML-WARNING-COUNT

Number of warnings encountered in current sgml-parse.

#SGML-ERROR-TOTAL

Number of errors encountered in processing so far.

#SGML-WARNING-TOTAL

Number of warnings encountered in processing so far.

Date Format Items

All date format items for the DATE string expression start with an equal sign ("="). A lower-case "x" can be used or not with the formats indicated. When used, it signifies that that number should be formatted as two digits, with a leading zero added, if required, or in the case of "=xY", with the century part removed. When "x" is not used, then exactly as many digits are required are used (four for "=Y", and one or two otherwise). The valid format items are:

=xH Hour in 24-hour form

=xh Hour in 12-hour form

=xm Minute

=xs Seconds (no fraction)

=xS Seconds (with a fraction if available on the host system)

=a The letter "a" for the first 12 hours of the day, "p" otherwise.

=A Like a, only this produces "A" or "P".

=xY The year

=xM The number of the month (1 for January, etc.)

=n The English name of the month (e.g. January)

=xD The number of the day in the month

=W The English name of the day of the week (e.g. Tuesday)

== A single "=" character

12.0 Functions

Internal Functions

DEFINE {COUNTER | STREAM | SWITCH}? FUNCTION function name 
  prototype ELSEWHERE

DEFINE {COUNTER | STREAM | SWITCH}? FUNCTION function name 
  prototype AS
local*
action*

External Functions

DEFINE EXTERNAL 
    {COUNTER | STREAM | SWITCH | SOURCE | OUTPUT}?
    FUNCTION function name prototype AS constant string expression 
    {IN FUNCTION-LIBRARY constant string expression}?

The first string gives the name of the external function as it is defined in the external source file. The second string gives the library to look up the function in. If 'DECLARE FUNCTION-LIBRARY' has been given, the function-library need not be specified.

Function Prototypes

When the prototype is parenthesized in the definition, the call's arguments must also be parenthesized. Forms are:

()

No arguments for parenthesized prototypes only.

(arg {psep arg }*)

In unparenthesized functions, psep is a name. In parenthesized functions, it may be a name or a comma. arg has the form:

arg-type shelf-type shelf-name opt-init

arg-type is one of

MODIFIABLE

Argument is a shelf that can be changed in any way.

READ-ONLY

Argument is a shelf that can be read, but not modified.

VALUE

Argument is a single-item shelf that can only be read.

REMAINDER

Argument is a multi-item shelf that can only be read, and is composed of multiple elements when the function is called. Not supported in external functions.

opt-init has the form:

OPTIONAL INITIAL {value}

The OPTIONAL keyword may only be used in internal functions, and cannot be applied to REMAINDER arguments.

INITIAL values may only be specified for OPTIONAL VALUE arguments.

Function Calls

The name of the function, followed by a correct argument list, is sufficient to identify the function. There is no keyword that says "This is a function call."

13.0 Macros

OmniMark provides a general-purpose macro capability that allows a user-defined name to abbreviate a more complicated expression. Macros are a programming convenience. They can be assigned to delimited characters so that a special character substitutes for a longer expression. Macros can also be parameterized so that a repeating but variable pattern can also be shortened.

Macro Definitions

The syntax for declaring a macro is as follows:

MACRO macro-name {token}+
   arguments
IS
   replacement
MACRO-END

where

{token} +

is one or more OmniMark names or delimiters which must follow the argument when the macro is used. Possible delimiters include any of these characters: ( ) { } [ ] ! @ $ % ^ & * - + = | \ ~ ` : < , > . ? /

arguments

is a name that is replaced by text specified when the macro is invoked.

replacement

is the text inserted when the macro is called.

Macro Calls

The syntax for calling a macro is as follows:

macro-name arguments

Term	Description
action	See Actions.
aided translation types	See Translation Types.
alias	A name that stands in place of the real name of an element or attribute. Used in iterations as a convenient way of referring to the current item.
ancestry-qualifier	See Ancestry Qualifiers.
attribute	Name of an attribute as it appears in the document type definition (DTD).
attribute-identifier	See Attribute Identifiers.
character	A single character (or text representing a specific character) within quotation marks. For example, "a", "#", and "%n" are characters.
comparer	See Comparisons.
condition	See Conditions.
counter	Name of a counter. See Shelf Declarations.
data-attribute	Name of a data-attribute as it appears in the DTD.
data-attribute-identifier	See Attribute Identifiers.
entity	Name of an entity as it appears in the DTD.
gi	The generic identifier (name) of an element as it appears in the DTD.
group	Name of a group as defined in the GROUP declaration (see Other Declarations).
index/item	See Shelf Indexes.
insertion	See Insertion Points.
item	See Shelf Indexes.
local	See Local Declarations.
name	Text (without quotation marks) specifying a name.
number	An integer between -2,147,483,647 and 2,147,483,647.
numeric-expression	See Numeric Expressions.
pattern	See Patterns.
pattern-primary	See Primary Patterns.
program	3GL-like OmniMark programs without an explicit translation-type.
shelf-declaration	See Shelf Declarations.
shelf-name	Name of a shelf. See Shelf Declarations.
shelf-type	See Shelf Declarations.
static-string	See Static Strings.
stream	Name of a stream. See Special Streams.
string-expression	Any of the types of strings defined in String Expressions.
switch	Name of a switch. See Shelf Declarations.
translation-type	See Aided Translation Types.

ANY	Matches any character.
ANY-TEXT	Matches any character except the character that represents the end of line.
SPACE	Matches a single space character.
BLANK	Matches either a single space character or a single tab character.
WHITE-SPACE	Matches a single space character, tab character, or end of line character.
DIGIT	Matches any of the characters "0" through "9".
LETTER	Matches any of the 26 upper-case or lower-case letters of the Roman alphabet.
LC	Matches any lower-case letter of the Roman alphabet.
UC	Matches any upper-case letter of the Roman alphabet.

CONTENT-START	Recognizes the beginning of an elements content.
CONTENT-END	Recognizes the end of an elements content.
LINE-START	Recognizes the beginning of a line.
LINE-END	Recognizes the end of a line.
VALUE-START	Recognizes the beginning of a scanned value.
VALUE-END	Recognizes the end of a scanned value.
WORD-START	Recognizes the beginning of a string of letters and digits.
WORD-END	Recognizes the end of a string of letters and digits.

%%	A percent sign
%_	Alternative form for a space
%n	A newline sequence
%t	A tab character
%#	#
%b r{d1 ,d2 ,...dn }	Characters whose values are given by d1 ,d2 ,...dn in base b
%)	)
%"	"
%'	'
%@	Interpolates arguments inside macro expansions
%@%	% inside macro expansions (used to dynamically construct format items)

%[fkjluw]a	Alpha representation of a counter
%[fnumber ]b	Binary representation of a counter
%[hlsuz]c	Contents of an element
%[fklrsuznumber ]d	Decimal representation of a counter
%[fklu]g	Contents of a stream buffer
%[fklu]i	Roman numeral representation of a counter
%[fklu]q	Name of the current element
%[fklu]q	Name of the current entity (in EXTERNAL-DATA-ENTITY and EXTERNAL-TEXT-ENTITY rules)
%[eop]q	Public and system identifier information associated with the current entity (in EXTERNAL-DATA-ENTITY and EXTERNAL-TEXT-ENTITY rules). The 'o' modifier cannot be used in EXTERNAL-TEXT-ENTITY rules.
%[fklu]v	Contents of an attribute (in general)
%[hlsuz]v	Contents of a CDATA element attribute
%[eop]v	Contents of an ENTITY or ENTITIES attribute
%[ep]v	Contents of a NOTATION attribute
%[fklu]x	Contents of a pattern variable
%y	Symbolic representation of a counter.

=xH	Hour in 24-hour form
=xh	Hour in 12-hour form
=xm	Minute
=xs	Seconds (no fraction)
=xS	Seconds (with a fraction if available on the host system)
=a	The letter "a" for the first 12 hours of the day, "p" otherwise.
=A	Like a, only this produces "A" or "P".
=xY	The year
=xM	The number of the month (1 for January, etc.)
=n	The English name of the month (e.g. January)
=xD	The number of the day in the month
=W	The English name of the day of the week (e.g. Tuesday)
==	A single "=" character