HOME | COMPANY | SOFTWARE | DOCUMENTATION | EDUCATION & TRAINING | SALES & SERVICE | |
"The Official Guide to Programming with OmniMark" |
|
International Edition |
Previous chapter is Chapter 8, "Variable Scopes".
Next chapter is Chapter 10, "Accessing the External World".
An expression is simply a value that can be used in a computation. OmniMark has three kinds of expressions: numeric expressions, string expressions, and test expressions.
An expression can be as simple as a quoted string, a number, or a variable name. More complex expressions can be built up from these by using operators. An operator is a keyword or a phrase that uses one or more values (or "operands") to calculate a new value. Parentheses can be used to surround the sub-expressions within an expression to clarify which sub-expressions are operands to particular operators.
Whenever possible, OmniMark will evaluate an expression at compile-time. Compile-time expressions can be used wherever a constant numeric or string expression is required. There are some contexts in which only compile-time expressions can be used.
Numeric expressions can be used in:
In OmniMark, a numeric expression is an integer value between -2,147,483,647 and 2,147,483,647.
The simplest forms of a numeric expression are:
The following are valid numeric expressions:
Example A
0
Example B
-5014
Example C
2147483647
Example D
"-443"
Example E
"100000000"
The following are illegal:
Example A
2,147,483,637
Example B
-3000000000
Example C
"--100"
Example D
3.145
Because any string expression can be used as a numeric value, saved pattern text, attribute values and even element content can be used as a numeric value providing that the contents form a valid number.
Numbers with a radix other than 10 must be expressed as a string expression, and can be converted to a numeric expression using the operator BASE as described in Section 9.2.2.1, "Converting Numbers From Other Bases".
The types of numeric operators are:
In most places in OmniMark, wherever a number is allowed to appear, a more general numeric expression may be used. While numeric expressions in actions and conditions are evaluated when the rules are executed, numeric expressions in declarations are evaluated immediately, and therefore may not mention dynamic objects (i.e. objects that have values only while the program is being executed, such as counters, attributes, and pattern variables).
This subsection describes the operators which can operate on numeric expressions.
OmniMark supports two sign operators: "+" and "-". The sign precedes a numeric expression.
(+ | VALUE) numeric-expression
Example A
+432
Example B
+("4987" + 315 * 12)
Example C
+ "3" > "04"
Example D
VALUE "3" > "04"
The postive sign does not change the value of the numeric expression on which it operates. It is provided:
The plus sign ("+") is encouraged over the keyword VALUE, because VALUE is used to refer to a type of function argument in OmniMark V3. (VALUE is provided for compatibility with previous releases of OmniMark, although it was more restricted in its use then.)
(- | NEGATE) numeric-expression
Example A
-432
Example B
-("4987" + 315 * 12)
Example C
- "3" > "04"
Example D
NEGATE "3" > "04"
The negative sign changes the sign of the numeric expression on which it operates. If applied to a negative numeric expression, the result is positive. If applied to a positive expression, the result is negative. The negative sign does not change the absolute value of the numeric expression.
The punctuational form ("-") is encouraged over the keyword form (NEGATE).
Arithmetic operators are used for performing arithmetic on numeric expressions. OmniMark supports the following five arithmetic operations: addition, subtraction, multiplication, division, and modulo.
numeric-expression (+ | PLUS) numeric-expression
Example A
43 + 967
Example B
12 + 19 + -35
Example C
LOCAL COUNTER a LOCAL COUNTER b LOCAL COUNTER c LOCAL COUNTER d ... a + b + c + d
Example D
LOCAL COUNTER a LOCAL COUNTER b LOCAL COUNTER c LOCAL COUNTER d ... a PLUS b PLUS c PLUS d
PLUS or "+" implements addition in OmniMark. "+" is the preferred form.
numeric-expression (- | MINUS) numeric-expression
Example A
43 - 967
Example B
12 - 19 - -35
Example C
LOCAL COUNTER a LOCAL COUNTER b ... a - b
Example D
LOCAL COUNTER a LOCAL COUNTER b ... a MINUS b
MINUS or "-" implements subtraction in OmniMark. "-" is the preferred form.
Note: since the hyphen character used in variable names is the same character as the minus sign, OmniMark programmers must be careful to separate minus signs from names.
For instance, the a - b will be interpreted as a subtraction, while a-b will be interpreted as a name.
numeric-expression (* | TIMES) numeric-expression
Example A
43 * 967
Example B
12 * 19 * -35
Example C
LOCAL COUNTER a LOCAL COUNTER b LOCAL COUNTER c LOCAL COUNTER d ... a * b * c * d
Example D
LOCAL COUNTER a LOCAL COUNTER b LOCAL COUNTER c LOCAL COUNTER d ... a TIMES b TIMES c TIMES d
TIMES or "*" implements multiplication in OmniMark. "*" is the preferred form.
numeric-expression (/ | DIVIDE) numeric-expression
Example A
43 / 967
Example B
12 / 19 / -35
Example C
LOCAL COUNTER a LOCAL COUNTER b ... a / b
Example D
LOCAL COUNTER a LOCAL COUNTER b ... a DIVIDE b
DIVIDE or "/" implements division in OmniMark. "/" is the preferred form.
numeric-expression MODULO numeric-expression
Example
LOCAL COUNTER seconds LOCAL COUNTER minutes LOCAL COUNTER hours LOCAL COUNTER days ... SET minutes TO seconds / 60 SET seconds TO seconds MODULO 60 SET hours TO minutes / 60 SET minutes TO minutes MODULO 60 SET days TO hours / 24 SET hours TO minutes MODULO 24
MODULO implements the operation of taking the modulus of a number with respect to a base value. The modulus is the remainder that you get if you divide the number by the base value.
Characters and numbers in non-text files sometimes represent codes with individual bit fields in them. OmniMark provides bit-oriented operators that treat numeric expressions as sequences of bits rather than as values.
In OmniMark, numeric expressions are equivalent to bit sequences 32 bits long. 0 is equivalent to a bit sequence containing all zeros.
Five bit-oriented operators are provided to help in isolating, processing and creating binary values.
numeric-expression MASK numeric-expression
Example
DO WHEN c MASK 1 != 0 ... DONE
The MASK of two bit sequences is a new sequence where each bit is:
The above example shows a DO block which will be executed if the lowest order bit in c has the value 1.
For another example, the value of 5 MASK 6 is 4: the third bit from the bottom is the only one that has the value 1 in both numbers when they are converted to bit sequences.
Other languages sometimes refer to this operation as a "bit-wise and" because the operation is analogous to the logical "and" operation. To avoid confusion, OmniMark reserves the keyword AND and the operator "&" for the logical "and" operation.
numeric-expression UNION numeric-expression
Example
SET c TO c UNION 1
The UNION of two bit sequences is a new sequence where each bit is:
The above example turns on the lowest order bit in c.
For another example, the value of 5 UNION 6 is 7: every one of the bottom three bits have the value 1 in either one or the other operand.
Other languages sometimes refer to this operation as a "bit-wise or" because the operation is analogous to the logical "or" operation. To avoid confusion, OmniMark reserves the operator "|" (OR) for the logical "or" operation.
numeric-expression DIFFERENCE numeric-expression
Example
DO WHEN c DIFFERENCE d != 0 ... DONE
The DIFFERENCE of two bit sequences is a new sequence where each bit is:
The above example will perform action within the DO block if c and d have any corresponding bits with the same value.
For another example, the value of 5 DIFFERENCE 6 is 3: only bottom two bits are different. The third bits are both 1, and the remaining bits are all 0.
Other languages sometimes refer to this operation as a "bit-wise exclusive-or".
COMPLEMENT numeric-expression
Example
Example A
SET c TO COMPLEMENT 1
Example B
SET c TO c MASK COMPLEMENT 1
The COMPLEMENT of two bit sequences is a new sequence where each bit is "flipped". In other words, each bit is:
For example, the complement of 0 is "11111111111111111111111111111111" BASE 2. (This actually results in a negative number on most systems because the topmost bit is a sign bit. A "1" in this position usually indicates a negative number. This is one reason to avoid dealing with numeric expressions as both bit sequences and numbers at the same time.)
Other languages sometimes refer to this operation as a "bit-wise not" because the operation is analogous to the logical "not" operation. To avoid confusion, OmniMark reserves the operator "!" (NOT) for the logical "not" operation.
numeric-expression SHIFT numeric-expression
Example
SET c TO 1 SHIFT 1
The result of the SHIFT operator is the bit sequence specified by the first operand shifted up by the number of places indicated by the second operand. If the second operand is:
As bits are shifted, the positions which are no longer occupied are filled with zeros. Thus, shifting a bit sequence by 32 or more in either direction will result in a bit sequence of all zeros.
Bits which get shifted "off the end" are discarded. The result is always a sequence of 32 bits.
The above example sets the second from the bottom bit to 1, and the rest to 0.
For another example, the value of 5 SHIFT 6 is 320: the bits are moved up by six places.
The following table shows the effect of the above operators on the sample values 45 ("101101" BASE 2) and 6 ("110" BASE 2 or "000110" BASE 2).
Operation Results (BASE 10) (BASE 2) 101101 MASK 110 4 000100 101101 UNION 110 47 101111 101101 DIFFERENCE 110 43 101011 101101 SHIFT 110 2880 101101000000 COMPLEMENT 101101 -46 11111111111111111111111111010010 COMPLEMENT 110 -7 11111111111111111111111111111001
In OmniMark, it is possible to compare two or more numeric values. The following comparison operators are available:
numeric-expression (= | IS EQUAL) numeric-expression
Example
DO WHEN a = b ... DONE
The "=" operator compares two numeric expressions. The result is TRUE if the numeric expressions have the same value, and FALSE if they are different.
numeric-expression (ISNT EQUAL | !=) numeric-expression
Example
DO WHEN a != b ... DONE
The "!=" operator compares two numeric expressions. The result is TRUE if the numeric expressions have different values, and FALSE if they are the same.
numeric-expression (IS GREATER-THAN | >) numeric-expression
Example
DO WHEN a > b ... DONE
The ">" operator compares two numeric expressions. The result is TRUE if the first numeric expression has a greater value than the second, and FALSE otherwise.
The preferred form is ">". "ISNT LESS-EQUAL" is also provided as a synonym.
numeric-expression >= numeric-expression
Example
DO WHEN a >= b ... DONE
The ">=" operator compares two numeric expressions. The result is TRUE if the first numeric expression has a greater or the same value as the second, and FALSE otherwise. The result is TRUE if the numeric expressions have different values, and FALSE if they are the same.
The preferred form is ">=". "ISNT LESS-THAN" is also provided as a synonym.
numeric-expression < numeric-expression
Example
DO WHEN a < b ... DONE
The "<" operator compares two numeric expressions. The result is TRUE if the first expression has a lesser value than the second, and FALSE otherwise.
The preferred form is "<". "ISNT GREATER-EQUAL" is also provided as a synonym.
numeric-expression <= numeric-expression
Example
DO WHEN a <= b ... DONE
The "<=" operator compares two numeric expressions. The result is TRUE if the first expression has a lesser or the same value as the second expression, and is FALSE otherwise.
The preferred form is "<=". "ISNT GREATER-THAN" is also provided as a synonym.
An unparenthesized comparison can consist of more than two parts, provided the parts are compatible. The operator "!=" is never permitted in a multi-part comparison, and "<" or "<=" cannot be mixed with ">" or ">=".
A multi-part numeric comparison must consist of operators from either one or the other of the following sets (but not both):
Examples of multi-part comparisons are:
Example A
DO WHEN a = b = c = d ... DONE
Example B
DO WHEN a < b < c < d ... DONE
Example C
DO WHEN a >= b >= c >= d ... DONE
Example D
DO WHEN a <= b = c < d ... DONE
These multi-part comparisons are equivalent to the following tests constructed from single-part comparisons:
Example A
DO WHEN a = b & b = c & c = d ... DONE
Example B
DO WHEN a < b & b < c & c < d ... DONE
Example C
DO WHEN a >= b & b >= c & c >= d ... DONE
Example D
DO WHEN a <= b & b = c & c < d ... DONE
The following multi-part comparisons are illegal because they either contain comparisons which are not permitted in a multi-part comparison or because they combine comparison operators from both of the sets listed above :
Example A
DO WHEN a != b != c ... DONE
Example B
DO WHEN a != b = c ... DONE
Example C
DO WHEN a >= b <= c ... DONE
Because string expressions can be used as numeric expressions, two nearly identical tests can have opposite results. For example, the following tests are string comparisons. The first example in particular evaluates to FALSE because it is a string comparison:
Example A
DO WHEN "01" = "001" ... DONE
Example B
DO WHEN ATTRIBUTE col-count = "1" ... DONE
Adding the VALUE or "+" monadic operator changes the above string comparison tests into numeric tests:
Example A
DO WHEN + "01" = "001" ... DONE
Example B
DO WHEN + ATTRIBUTE col-count = "1" ... DONE
Now, in the second example, both "01" and "001" are converted to numeric equivalents. Leading zeros are ignored, giving the value of 1 on both sides, and hence the test evaluates to TRUE.
Operator precedence is the order in which operators are evaluated. When two operators apply to the same term, the one with the higher precedence is evaluated first.
Monadic operators are operators which only take a single operand. Dyadic operators are operators which take two operands. Monadic operators always have precedence over dyadic operators.
The precedence table for numeric operators from highest precedence to lowest is:
Bit-oriented and arithmetic operators may be combined in numeric expressions. For example, the following expressions are equivalent
Example A
11 MASK 5 * 3 + 14 UNION 35 SHIFT 5 / 56 + COMPLEMENT 3
Example B
((((11 MASK 5) * 3) + 14) UNION ((35 SHIFT 5) / 56)) + (COMPLEMENT 3)
This example is evaluated as follows is evaluated as follows, starting with the innermost parentheses:
Parentheses can be used to group subexpressions to override the precedence. For example, in:
Example A
3 * 5 + 2
Example B
3 * (5 + 2)
The first example evaluates as 17, while the second one evaluates to 21.
Leading minus signs can be parenthesized with their arguments as well:
-2 * (-3)
is evaluated as 6.
A string expression is an expression that evaluates to a sequence of zero or more characters. String expressions are allowed in most contexts where quoted strings are allowed.
Only constant string expressions are allowed for SGML names or the names of OmniMark objects, or in declarations.
The simplest forms of a string expression are:
Quoted strings are programmer-specified sequences of characters. They can be surrounded by single quotation marks, or surrounded by double quotation marks. The delimiting quotation mark cannot appear as a character within the string unless it is escaped. (Escaping characters within a quoted string is described in Section 9.2.1.1, "Constant Format Items In Strings".
All characters that don't have specific control-character significance are allowed in a quoted string. In particular, the only characters not allowed are:
These characters can either be represented in a quoted string by the "%n" format item, or by using the "%#" format item to specify the character value explicitly.
(In versions of OmniMark prior to V3, only non-control characters below ASCII value 127 were permitted in quoted strings. Tabs were disallowed because they are difficult to distinguish from spaces. The "%t" format item is still recommended over an actual tab character because it can be distinguished from spaces.)
A string can only be entered over multiple input lines if it is divided into quote-delimited parts that appear on different lines and separated with the underscore ("_") character.
A format item is a sequence of characters with a special meaning that occurs inside of a string.
The basic structure of a format item is the escape character followed by zero or more format modifiers, and then the identifying character of the format. Format items which operate on OmniMark variables specify the name of the variable in parentheses after the identifying character.
By default, the escape character is the percent ("%") character. The escape character can be changed using the ESCAPE declaration. The escape character lets OmniMark know that the next few characters will require special processing. In order to use the escape character as itself, it must be entered twice.
Format modifiers are usually a single letter. Some modifiers require a preceding integer.
Example A
OUTPUT "There were %k3fd(item-number) items."
Example B
OUTPUT "Processing is complete.%n"
The examples above show the "%d" and the "%n" format items. Two modifiers are applied to the "%d" format item: the "f" format modifier which takes a preceding integer argument (in this case, 3) and the "k" format modifier. Note that the "%d" format item is also followed by the name of the variable in parentheses.
Format items which do not reference SGML context or OmniMark variables are referred to as constant format items. Constant format items are the only format items which can occur in constant strings.
The constant format items are:
This format item can be used to enter a double quote character in a string that is surrounded with double quotes. In single-quoted strings, the double quote character can be entered with or without escaping.
This format item can be used to enter a single quote character in a string that is surrounded with single quotes. In double-quoted strings, the single quote character can be entered with or without escaping.
The escape character always indicates a format item. In order to enter the escape character itself in a string, it must be typed twice. The first one escapes the second one. By default, the escape character is the percent ("%"), so two percents in a row will be interpreted as a single percent character.
Similarly, if the escape character is changed, then entering two escape characters in a row will be interpreted as a single escape character. The percent character would no longer be special.
This format is used to enter the text-mode newline sequence in a string. By default, this is just the line feed character (ASCII 10). This value can be changed by the deprecated NEWLINE declaration. See Section 19.1.6, "The NEWLINE Declaration and Binary I/O".
This format is used to enter the tab character (ASCII 9). The tab character can be entered directly as well, but it is difficult for the reader of the program to differentiate between a tab and a certain number of spaces. (This is especially true when reading a printout of the program.) Use of the "%t" format can contribute greatly to program readability in many cases.
This format is used to enter a space character. Although the space character can be entered directly, it sometimes increases readability to use visible characters to represent spaces.
This format is used to enter a character according to the value of its representation. The decimal representation of the character is entered between the escape ("%") character and the octothorpe ("#").
The "%r{}" can be used to enter longer sequences of such characters.
More detailed descriptions of some of these formats is given below. In the syntax descriptions below, the parts of the format item are separated by spaces for readability. When they are actually entered in the program, the spaces are not entered.
The format item for newlines has the following syntax:
% modifer* n
This is the normalized newline character for text-mode files. The character is also known as the "line feed" (ASCII 10).
Different operating systems use different sequences for newline characters:
When inputting a file in text-mode, the newline sequence is automatically converted to a single line feed character. This makes programs which deal with text-mode files more platform independent. On output, the reverse translation is done. Thus, the OmniMark programmer sees the line feed character, but the external system sees what it expects to see for the line end sequence.
The only modifier permitted is:
indicates that white space stripping can be applied to this newline.
White space stripping is the process of collapsing sequences of white space characters to a single character. If the sequence contained a newline, the character is the newline character. Otherwise the character is a space. White space stripping is described in Section 4.1.2, "Processing Content".
The format item for the space character has the syntax:
% modifier? _
With no modifiers, the "%_" format item has the same effect as a space character. However, this format item also permits the "s" modifier. The "s" modifier indicates that the white space stripping can be applied to this space. White space stripping is described in Section 4.1.2, "Processing Content".
Normally, space characters in an OUTPUT action are intended to appear in the final OmniMark output. Use of the "s" modifier allows the programmer to avoid strings of white space characters that would otherwise be generated if successive OUTPUT actions (perhaps from rules in different parts of a program) happen to produce adjacent white space characters.
The tab format item has the syntax:
% modifier? t
With no modifiers, this format item has the same effect as a tab character. However, this format item also permits the "s" modifier. The "s" modifier indicates that the white space stripping can be applied to this tab. White space stripping is described in Section 4.1.2, "Processing Content".
Normally, tab characters in an OUTPUT action are intended to appear in the final OmniMark output. Use of the "s" modifier allows the programmer to avoid strings of white-tab characters that would otherwise be generated if successive OUTPUT actions (perhaps from rules in different parts of a program) happen to produce adjacent white-tab characters.
Programmers may wish to enter control characters or characters not available on a particular keyboard by referring to their character code. The format "%#" allows the programmer to enter the character using its numeric code. The "%#" format item has the syntax:
% number #
The number is given in decimal. For example, since the bell character is represented by 7 in the ASCII character set, this character can be entered in an OmniMark program with the format "%7#".
No modifiers are permitted with this format item.
An alternate syntax is available when many such characters are to be entered, or if it is more convenient to enter the characters using a different radix. Often, it may be more convenient to enter the numeric code using hexadecimal or octal numbers. The alternate syntax is:
% number r { number (, number)* }
The first number is the radix with which to interpret the numbers in the braces. It must be between 2 and 36 inclusive. For example, the following is a fifteen-character sequence entered in hexadecimal notation:
OUTPUT "%16r{4F,6D,6E,69,4D,61,72,6B,20,52,75,6C,65,73,21}%n"
Letters which are used to represent digits greater than 9 may be lower-case or upper-case, or any combination.
The "r" modifier is not optional. It must always be present, even if the radix is 10.
Strings can be divided into separate, quote-delimited parts that are joined together with an underscore character. There are no restrictions on the use of white space around the string parts or the underscore. This convention is especially useful for entering long strings that do not fit on a single line within a program and for entering strings that contain both single and double quotation marks. For example, the following two examples are different ways of representing the same string:
Example A
"ABCDEFGHIJKLMNOPQRSTUVWXYZ" _ "abcdefghijklmnopqrstuvwxyz"
Example B
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
The maximum length of an OmniMark string, excluding the surrounding quotation marks and any underscores used to enter long strings, is 2,048 characters.
There are several methods for entering the string that consists of a double quotation mark surrounded by single quotation marks. Possibilities include the following:
Example A
"'%"'"
Example B
'%'"%''
Example C
"'" _ '"' _ "'"
Example D
"'%34#'"
(The last example is only valid when the ASCII character set is being used.)
When a string contains only digits, with an optional leading "+" or "-", it may be used as a numeric value wherever a numeric expression is required. For example, the following sets counter i to 15:
LOCAL COUNTER i SET i TO "15"
This is commonly used for capturing attribute values. For example, if an element table has an attribute numcols, the attribute value could be captured in a counter as follows:
LOCAL COUNTER number-of-cols SET number-of-cols TO ATTRIBUTE numcols WHEN ATTRIBUTE numcols IS SPECIFIED
Even the content of an element can be converted to a number, if it is guaranteed to be convertible. An example would be:
ELEMENT month LOCAL COUNTER month-date ; strip away leading and trailing white-space SET month-date TO "%sc"
string-expression BASE numeric-expression
Example
FIND [DIGIT | "ABCDEFabcdef"]+ => hex-location LOCAL COUNTER location SET location TO hex-location BASE 16
The operator BASE is used to convert a string representation of a non-decimal number into a numeric value.
The first operand is a string representation of the number in the specified base. Letters are used to represent digits above 9. For instance, hexadecimal (base 16) numbers can consist of the digits "0123456789abcde". Octal numbers (base 8) can only contain the digits "01234567". When letters are used, they can be either lower case or upper case.
The second operand is the radix. The radix must be between 2 and 36 inclusive.
It is an error if a character in the string expression is not a valid digit in the specified base. For example, the expression "16a" BASE 10 would cause an error because "a" is not a valid digit in the base 10 system.
string-expression BINARY numeric-expression
Example
CROSS-TRANSLATE FIND ANY {1 TO 4} => word LOCAL COUNTER i SET i TO word BINARY 0 OUTPUT "The value is %d(i)%n"
BINARY string-expression
Example
CROSS-TRANSLATE FIND ANY {1 TO 4} => word LOCAL COUNTER i SET i TO BINARY word OUTPUT "The value is %d(i)%n"
The BINARY operator is used to process a string of 1 to 4 characters as if it were a sequence of 8 to 32 bits, and yield the number represented by that bit sequence.
This can be very useful when processing binary data in files. Numbers are often stored as bit sequences rather than string representations in such files.
For example, a word processor that uses escape codes and character counts to indicate how long a given sequence is might specify that the next 90 bytes should be underlined by the two characters "%21#%90#". (The "%21#" character is also known as a "Control-U" character, obtained by pressing the control key and "u" key simultaneously. The "%90#" character is also the letter "Z".) This representation is more compact than using the characters "90" to represent the length.
In OmniMark, converting a value such as "Z" into its decimal value, (in this example, 90) is done using the BINARY operator.
As long as the string values are always one character long (a zero-length string may not be converted), the order of evaluation is obvious.
A problem arises when a numeric value is represented with more than one byte. On some systems, the higher the byte position, the more significant its value; on others the lower bytes are more significant. Some other systems only swap bytes within each pair: the higher pair could be the more significant, but within each pair the lower byte would be more significant.
The BINARY operator can be used in two different ways. The general form takes two operands: the string expression containing the binary representation of the number, and a numeric expression specifying how the bytes are ordered in binary representation. The more compact form of the BINARY takes just the string expression operand. The default ordering is 0. This can be changed with the BINARY-INPUT declaration.
The following tables describe the meaning of each byte ordering value. The first table shows how different values affect conversion on the string "%10r{1,2,3,4}".
The above string would be converted to the number 1 * 16,777,216 + 2 * 65536 + 3 * 256 + 4 = 16,909,060.
The above string would be converted to 2 * 16,777,216 + 1 * 65536 + 4 * 256 + 3 = 33,620,995.
The above string would be converted to 3 * 16,777,216 + 4 * 65536 + 1 * 256 + 2 = 50,594,050.
The string would be converted to 4 * 16,777,216 + 3 * 65536 + 2 * 256 + 1 = 67,305,985.
Code numbers greater than 3 are divided by 4, and the remainder is used (i.e. the code number "modulo" 4 is used). Code numbers less than zero are in error.
BINARY can be used as a prefix operator when the desired byte ordering is the same as:
For example, the following two expressions are equivalent:
Example A
... BINARY-INPUT 3 ... LOCAL STREAM s LOCAL COUNTER c SET c TO BINARY s
Example B
... BINARY-INPUT 4 ; any value can be used instead of 4 ... LOCAL STREAM s LOCAL COUNTER c SET c TO s BINARY 3
The following complete sample program shows how to switch the bytes in a file from a source with Code-2 style ordering (low pair before the high pair, high byte before the low byte within a pair) to a Code-0 style ordering (highest byte first, lowest byte last, in every four bytes). For every four bytes in the input (and whatever remains at the end of the file), the program carries out a binary-input conversion using the input code of 2 on the bytes captured in the pattern variable word, placing the result in the counter i. It then converts that string to its output representation, as determined by code 0, and writes that representation to the output stream.
CROSS-TRANSLATE BINARY-INPUT 2 BINARY-OUTPUT 0 FIND ANY {1 TO 4} => word LOCAL COUNTER i SET i TO BINARY word OUTPUT "%4fb(i)"
The program shows how the declarations set up the actual processing of the program. Alternatively, the two binary declarations could have been left off. In the next sample, explicit codes are used to direct the conversion:
CROSS-TRANSLATE FIND ANY {1 TO 4} => word LOCAL COUNTER i SET i TO word BINARY 2 OUTPUT "%4f0b(i)"
The test in the following example is true when the characters saved in the pattern variable named sequence-length describe a binary value greater than 100.
BINARY-INPUT 3 ... FIND ANY => sequence-length ... WHEN BINARY sequence-length > 100
The above BINARY-INPUT declaration indicates that the most significant values come last. For example, if sequence-length contains the text "e%0#%0#%0#", binary evaluation of this value gives 101.
LENGTH OF string-expression
Example A
DO WHEN LENGTH OF "123" = 3 ... DONE
Example B
DO WHEN LENGTH OF DATE "%g(hold-buf)--%x(sequence-length)%eov(attr)%n" <= list-height * 10 + sequence-count ... DONE
The "LENGTH OF" operator calculates the length of a string expression. The result is a numeric expression.
string-expression (JOIN | ||) string-expression
The "||" operator concatenates its two string arguments.
LOCAL STREAM a LOCAL STREAM b LOCAL STREAM c ... SET a TO b || "," || c
The keyword JOIN can be used as a synonym for the "||" operator.
string-expression ||* numeric-expression
The "||*" operator concatenates numeric-expression copies of the specified string-expression. The numeric-expression must be zero or more.
OUTPUT "=" REPEATED 80 || "%n"
The keyword REPEATED can be used as a synonym for the "||*" operator.
Prior to OmniMark V3, the precedence of BASE and BINARY were higher than that of "@" (ITEM). This made these operators different from almost every other dyadic operator.
To remove any confusion or errors that this may cause, OmniMark V3 corrects the precedence so that "@" (ITEM) takes precedence over BASE and BINARY.
For example, in OmniMark V2, the first two examples are equivalent. In OmniMark V3, the first and third examples are equivalent.
Example A
LOCAL COUNTER x LOCAL STREAM y VARIABLE ... SET x TO y @ "10" BASE 2
Example B
LOCAL COUNTER x LOCAL STREAM y VARIABLE ... SET x TO y @ ("10" BASE 2)
Example C
LOCAL COUNTER x LOCAL STREAM y VARIABLE ... SET x TO (y @ "10") BASE 2
In OmniMark, it is possible to lexically compare two or more string values. The following string comparison operators are available:
String comparisons have the following form:
Syntax
string-expression (< | <= | = | != | > | >=) UL? string-expression
The optional keyword UL means that case is ignored in the comparison.
Lexical comparisons are done on a character by character basis. Corresponding characters of the two strings are compared, and the first characters that are different determine the result of the comparison. The character comparisons are based on the numeric value of the character. For instance, the space (ASCII 32) sorts before the letter "A" (ASCII 65) on systems using the ASCII representation of characters. (On EBCDIC systems, the EBCDIC values would be used.)
If all of the characters are the same up to the end of one of the strings, then the shorter string is considered to be less than the longer one. Thus, "aa" is less than "aaa".
If the UL option is given, then for the purpose of the comparison, every letter that has both an upper-case and a lower-case value is mapped to the lower value. For example, on ASCII systems, the letter "A" (ASCII 97) is less than "B" (ASCII 98), but "a" (ASCII 65) is greater than "B".
Example A
DO WHEN "a" < UL "B" ... DONE
Example B
DO WHEN "a" < "B" ... DONE
Example C
DO WHEN "a" < "b" ... DONE
In the above examples, the first and third comparisons are equivalent and yield TRUE, while the second one yields FALSE.
The "DECLARE DATA-LETTERS" declaration can specify upper/lower-case relationships for other characters, like accented characters. See Section 19.1.2.4, "Letter Characters in Data"
More than one string expression can be combined on the right-hand side of a string test, surrounded by parentheses and separated by the "|" operator or the keyword OR. The following tests are equivalent:
Example A
GLOBAL STREAM alpha-number-text ... WHEN alpha-number-text > UL ("M" | ATTRIBUTE limit)
Example B
GLOBAL STREAM alpha-number-text ... WHEN (alpha-number-text > UL "M") | (alpha-number-text > ATTRIBUTE limit)
Like numeric comparisons, an unparenthesized comparison can consist of more than two parts, provided the parts are compatible. The operator "!=" is never permitted in a multi-part comparison, and "<" or "<=" cannot be mixed with ">" or ">=". Furthermore, UL comparisons cannot be mixed with non-UL comparisons.
A multi-part string comparison must consist of operators from any single one of the following sets:
Examples of multi-part comparisons are:
Example A
DO WHEN a = b = c = d ... DONE
Example B
DO WHEN a < b < c < d ... DONE
Example C
DO WHEN a >= b >= c >= d ... DONE
Example D
DO WHEN a <= b = c < d ... DONE
These multi-part comparisons are equivalent to the following tests constructed from single-part comparisons:
Example A
DO WHEN a = b & b = c & c = d ... DONE
Example B
DO WHEN a < b & b < c & c < d ... DONE
Example C
DO WHEN a >= b & b >= c & c >= d ... DONE
Example D
DO WHEN a <= b & b = c & c < d ... DONE
The following multi-part comparisons are illegal:
Example A
DO WHEN a != b != c ... DONE
Example B
DO WHEN a != b = c ... DONE
Example C
DO WHEN a >= b <= c ... DONE
Example D
DO WHEN a > UL b >= c ... DONE
string-expression MATCHES UNANCHORED?pattern
Example
DO WHEN ATTRIBUTE codes MATCHES UNANCHORED (WORD-START "OK" WORD-END) ... DONE
The MATCHES operator provides a more general way to test the contents of a string expression by matching it against an OmniMark pattern. When only one pattern is being tried, the most convenient way to do this is often the MATCHES operator.
The MATCHES operator takes two operands: the string expression being scanned and the pattern against which it is matched. This takes the form:
The keyword UNANCHORED is used when pattern can appear anywhere in string-expression; otherwise the pattern must appear at the beginning of the string-expression. For example, the above example succeeds only if the attribute "codes" contains the word "OK".
Parentheses must be used to enclose the pattern when it contains more than one component (as it does above).
When more than one pattern is being used, it is often clearer to use the "DO SCAN" construct. See Section 3.2.2.1, "Scanning Input With a Single Pattern".
String expressions can be used to create numeric values, as in the following examples:
LOCAL COUNTER n LOCAL STREAM n-text LOCAL STREAM s VARIABLE ... SET n TO (n-text || "0") OUTPUT s @ ("1" ||* 3)
Any dyadic string operators occurring in a numeric expression must be protected by parentheses. For example, the following is not allowed, even though it has only one reasonable interpretation:
"1" || "2" - 3
It must be written as follows, with the order of evaluation made explicit:
("1" || "2") - 3
The reason for this approach is that ambiguities result otherwise. Without the parenthesization requirement, the following has two valid but distinct alternative interpretations:
LOCAL COUNTER n LOCAL STREAM s VARIABLE ... SET n TO s @ "1" || "2" + 1
The parenthesization requirement means that a correct OmniMark program will always clearly show the desired interpretation, as follows:
LOCAL COUNTER n LOCAL STREAM s ... SET n TO (s @ "1" || "2") + 1 SET n TO s @ ("1" || "2") + 1 SET n TO (s @ ("1" || "2")) + 1
The second and third examples are equivalent. The "@" operator always has higher precedence than addition, and the value of the selected stream item will be converted to a numeric value and added to 1.
A test expression is any expression that evaluates to either TRUE or FALSE. Comparisons are all test expressions. Test expressions can only be used in conditions or saved in SWITCH items. They have no numeric or string representations.
The simplest forms of a text expression are:
Comparisons are examples of test expressions. They take either numeric, string, or test expressions as operands and produce a test value (TRUE or FALSE) as a result.
Test expressions can be combined with the logical operators "&" (AND) and "|" (OR). Test expressions can be negated with the operator "!" (NOT).
test-expression (AND | &) test-expression
Example
DO WHEN a < b & c < d ... DONE
When two test expressions are combined with "&", the resulting test expression is TRUE if both of the operands evaluate to TRUE and FALSE if either or both of the operands evaluate to FALSE. If the first operand evaluates to FALSE, then the second operand is not evaluated.
test-expression (OR | |) test-expression
Example
DO WHEN a < b | c < d ... DONE
When two test expressions are combined with "|" (OR), the resulting test expression is TRUE if either or both of the operands evaluate to TRUE. If both of the operands evaluate to FALSE then the resulting expression also evaluates to FALSE. If the first operand evaluates to TRUE, the second operand is not evaluated.
(NOT | !) test-expression
Example
DO WHEN a < b & NOT c < d ... DONE
The operator "!" simply negates its operand. The resulting expression returns FALSE if the operand evaluates to TRUE and FALSE if the operand evaluates to TRUE.
OmniMark omits the evaluation of the second operation of an "&" or "|" expression when the result of the first operand is sufficient to determine the result of the whole expression.
This is very useful when the first part of the test is used to ensure that the second part of the test will not cause an error. For instance, testing whether a PATTERN variable exists, and whether it has a particular value can be done in a single expression:
CROSS-TRANSLATE FIND DIGIT+ => int ("." DIGIT* => frac) DO WHEN frac IS SPECIFIED & frac != "" ... DONE
Conditional evaluation also allows an OmniMark programmer to take advantage of their knowledge of the behaviour of their program to gain some efficiency.
For instance, when combining two test expressions with an "&" or an "|", if it is known that one of the operands takes significantly longer than the other, then the less expensive one can be placed first.
Another technique can be used when combining two test expressions with an "&" and one of the expressions usually evaluates to FALSE. That operand should be placed first, so that the second expression need rarely be executed. The same technique can be used when using "|" and one of the operands usually evaluates to TRUE.
The precedence of logical operators in order from highest to lowest is:
"!" has precedence over the rest because unary operators always have precedence over binary operators.
For example, the condition:
LOCAL COUNTER chapno LOCAL STREAM xref ... WHEN ATTRIBUTE docno IS SPECIFIED | chapno > 1 & ! xref IS OPEN
is true if an explicit value for the attribute docno occurs in the document or if both the value of the counter chapno is greater than 1 and the stream xref is not open.
Parentheses can be used to override this priority:
LOCAL COUNTER chapno LOCAL STREAM xref ... WHEN (ATTRIBUTE docno IS SPECIFIED | chapno > 1) & ! xref IS OPEN
is true if and only if xref is not open and either docno occurs in the document or chapno is larger than 1.
In OmniMark, it is possible to compare two test expressions for equality:
test-expression = test-expression
or inequality:
test-expression != test-expression
The following are equivalent:
LOCAL COUNTER col-count ... WHEN (col-count = 1) = TRUE WHEN (col-count = 1) != FALSE WHEN col-count = 1
The test expression operator "!=" is equivalent to an exclusive-or operation in other languages. In other words, the following are equivalent:
Example A
switch-1 != switch-2
Example B
switch-1 & !switch-2 | !switch-1 & switch-2
OmniMark V3 does allow multi-part switch comparisons with the "=" operator. The following examples are equivalent:
Example A
DO WHEN a = b = c = d ... DONE
Example B
DO WHEN (a = b) & (b = c) & (c = d) ... DONE
Note that the following two expressions have the same meaning which is different than the preceding example. The previous example tested that all of the switch expressions had the same value. This example tests that the relationship of a to b is the same as the relationship of c to d: both equal, or both not equal.
Example A
DO WHEN (a = b) = (c = d) ... DONE
Example B
DO WHEN ((a = b) & (c = d)) | ((a != b) & (c != d)) ... DONE
The components of OmniMark actions and operators are evaluated in order, from left to right, and the action or operator is performed only after all its components (arguments) have been evaluated.
In particular, this means that the side effects of functions called during the evaluation of an action occur in the lexical order of the function calls in the action.
More particularly, the order of evaluation of the components of an action, operator or function call is as follows:
Value-returning expressions are those that produce COUNTER, SWITCH or STREAM values. Reference-returning expressions identify the item of a COUNTER, SWITCH or STREAM shelf rather than producing the value of that item. Reference-returning expressions are used as the left-hand side of SET actions and as MODIFIABLE and READ-ONLY function arguments.
These steps are applied recursively to the component expressions of an action, operator or function call, as those component expressions are evaluated.
The first and third stages are those at which function side-effects may occur.
The reason for this order of evaluation is to constrain the places at which checks for non-existent shelf items need be done. Note that a side effect of a function may be to create or delete a shelf item.
As an example of the above, in the following, f1 is called, f2 is called, then an item of COUNTER target is selected using the result of f1 as an item number, and finally the selected item's value is set to the result of f2:
SET COUNTER target @ f1 TO f2
A consequence of this left-to-right evaluation is that the selected item may not exist prior to the SET (or "SET COUNTER") action, but be actually created by either f1 or f2. It could even be removed by f1 and recreated by f2. (Needless to say, programming in this style is generally deprecated.)
Next chapter is Chapter 10, "Accessing the External World".
Copyright © OmniMark Technologies Corporation, 1988-1997. All rights reserved.
EUM27, release 2, 1997/04/11.