HOME | COMPANY | SOFTWARE | DOCUMENTATION | EDUCATION & TRAINING | SALES & SERVICE | |
"The Official Guide to Programming with OmniMark" |
|
International Edition |
Previous chapter is Chapter 13, "The Current Output Stream Set".
Next chapter is Chapter 15, "Processing SGML Errors".
OmniMark provides powerful SGML query facilities for retrieving information about the current SGML context. The following information is available:
In addition, OmniMark also provides:
OmniMark provides powerful and expressive facilities to obtain information about the element context at the current position in the document, and to obtain detailed information about any or all of the elements that are open at that position.
Normally, in OmniMark, element references and attribute references refer to the innermost open element at the current point in the document instance. The references can be modified with element qualifiers to specify a different element.
The terms used in element qualifiers are:
The following element qualifiers are always available:
specifies the current element. It is normally redundant within element-related tests. It is usually used to make explicit which attribute is being referenced.
specifies the parent of an element.
specifies an ancestor with a particular element name of an element.
specifies a preparent with a particular element name.
specifies the outermost element of the instance. This is used for discussing the outermost element without knowing its name.
specifies either the current element or an ancestor with a particular element name.
The element-name-list above is a parenthesized list of element-names separated by either the "|" operator or the keyword OR:
Syntax
( element-name ( (OR | |) element-name)* )
Except for "OF DOCTYPE", element qualifiers can themselves be qualified. Thus, constructs such as the following are permissible:
... PARENT OF PARENT OF ANCESTOR listitem ...
OmniMark provides a variety of ways to determine the element context at the current point in a document. There are tests:
Several tests pertain to open elements. Often these tests refer to an element with a specified element name. These tests are defined below.
In these test definitions, element-qualifiers represents the optional use of one or more of the above element qualifiers. If no element qualifiers are present, the test always applies to the current element.
All of these tests allow the programmer to specify a parenthesized list of element-names separated by OR or "|" instead of a single element name. The test succeeds if the reference element is identified by any of the names in the element-name-list.
ELEMENT element-qualifier* (IS | ISNT) (element-name | element-name-list)
The "ELEMENT IS" test checks whether the element defined by the element-qualifiers has one of the specified names.
PARENT element-qualifier* (IS | ISNT) (element-name | element-name-list)
The "PARENT IS" test checks whether the parent of the qualified element has a particular element-name.
ANCESTOR element-qualifier* (IS | ISNT) (element-name | element-name-list)
The "ANCESTOR IS" test checks if the element indicated by the qualifiers has any ancestor with a particular element name.
PREPARENT element-qualifier* (IS | ISNT) (element-name | element-name-list)
The "PREPARENT IS" test checks if a specified element has any preparent with a particular element name. In other words, it checks if the parent of the specified element has a particular ancestor.
OPEN ELEMENT element-qualifier* (IS | ISNT) (element-name | element-name-list)
The "OPEN ELEMENT IS" test checks if a specified element has any open element with a particular element name.
The "OPEN ELEMENT IS" test is equivalent to, but more efficient than, the "ANCESTOR IS" test combined with the "ELEMENT IS" test. The following examples are equivalent:
Example A
DO WHEN OPEN ELEMENT IS chapter ... DONE
Example B
DO WHEN ANCESTOR IS chapter | ELEMENT IS chapter ... DONE
Example C
DO WHEN PREPARENT IS chapter | PARENT IS chapter | ELEMENT IS chapter ... DONE
OmniMark also permits testing recently closed elements.
PREVIOUS element-qualifier* (IS | ISNT) (element-name | element-name-list)
The "PREVIOUS IS" test succeeds if the qualified element is not the first subelement of its parent, and if the previous subelement of the parent has a specified element name. Otherwise, it fails.
For instance,
DO WHEN PREVIOUS IS par ... DONE
tests if the current element follows a paragraph element.
The "PREVIOUS IS" test ignores inclusions and data content.
LAST PROPER? SUBELEMENT element-qualifier* (IS | ISNT) (element-name | element-name-list)
The "LAST SUBELEMENT IS" test succeeds if the most recently closed subelement of the qualified element has a specified element name.
When the keyword PROPER is specified, the "LAST SUBELEMENT IS" test ignores included elements. If all of the subelements of the referenced element have been included elements, then the test will fail.
When the "LAST SUBELEMENT IS" test is applied to the current element from an ELEMENT rule, it fails if the content has not yet been processed.
When invoked on ancestor elements, the "LAST SUBELEMENT IS" test is related to the "PREVIOUS IS" test. The following two examples are equivalent:
Example A
DO WHEN PREVIOUS IS par ... DONE
Example B
DO WHEN LAST PROPER SUBELEMENT OF PARENT IS par ... DONE
OmniMark also provides a contextual test that can test for the presence of data content as well as specific elements.
LAST PROPER? CONTENT element-qualifier* (IS | ISNT) (#DATA | element-name | content-identifier-list)
The "LAST CONTENT IS" test succeeds if the last content of the specified element was one of the elements specified or if the last content was data content if #DATA was specified.
When the keyword PROPER is specified, the "LAST CONTENT IS" test ignores included elements. If all of the subelements of the referenced element have been included elements, and if there was no data content in the element, then the test will fail.
The following example precedes the content of the keyword subelement with a slash ("/") only if it is preceded in its containing element by data content or an entity reference:
ELEMENT keyword OUTPUT "/" WHEN LAST CONTENT OF PARENT IS #DATA OUTPUT "%sc"
Note that at the start of an ELEMENT rule, there is no "LAST CONTENT" of that element, so any test for it will fail.
The test "LAST CONTENT IS" #DATA is true in a DATA-CONTENT rule only if the data content is immediately preceded either by a non-SGML external entity reference or by one or more processing instructions which are in turn preceded either by data content or by a non-SGML external entity reference. It never applies to the content processed by the DATA-CONTENT rule.
In the following example, the test succeeds only if the text processed by the rule is itself preceded by other data content or an entity reference. It does not succeed just because the data content of this rule has been processed prior to the test:
DATA-CONTENT OUTPUT "%c" OUTPUT " (again)" WHEN LAST CONTENT IS #DATA
There are three ways to classify elements based on why they are permitted at a given point in the document.
OmniMark provides the STATUS test to determine why an element was allowed:
STATUS OF LAST SUBELEMENT? element-qualifier* (IS | ISNT) (PROPER | INCLUSION)
The STATUS of the specified element is
These are opposites. "STATUS IS PROPER" is equivalent to "STATUS ISNT INCLUSION", and "STATUS IS INCLUSION" is equivalent to "STATUS ISNT PROPER".
If the "OF LAST SUBELEMENT" phrase is used, then the test is performed on the last subelement of the specified element. Otherwise, the test is performed on the specified element itself.
The following test succeeds only if the subelement immediately preceding the current element was an inclusion:
OUTPUT "\par %n" WHEN STATUS OF LAST SUBELEMENT OF PARENT IS INCLUSION
If the specified element does not exist, then the test fails.
There are two ways to check the position of an element with respect to the surrounding elements:
OCCURRENCE element-qualifier*
The OCCURRENCE operator returns the number of consecutive subelements of the parent of the current element, that have the same element name as the current element. If OCCURRENCE is followed by an element-qualifier, the indicated element is tested instead of the current element.
Inclusions are counted as well as proper subelements.
The first subelement of a parent, any subelement that is not of the same element type as its immediately previous sibling, and a subelement that immediately follows data content in its parent element will always have an OCCURRENCE count of one (1).
CHILDREN element-qualifier*
When an element-qualifier is not given, the CHILDREN operator returns the number of subelements of the current element. If used before the content of the element has been processed, it will always return zero.
When a element-qualifier is given, the children of the indicated element including the currently open child of the indicated element and all its preceding subelements are counted.
Inclusions are counted as well as proper subelements.
The following condition will succeed if the element in which the test occurs is the first element in its parent;
DO WHEN CHILDREN OF PARENT = 1 ... DONE
The following condition will succeed if the element is the first in a series of like elements, but not the first subelement of its parent:
DO WHEN OCCURRENCE = 1 & CHILDREN OF PARENT > 1 ... DONE
OmniMark also allows the programmer to retrieve information from the Document Type Declaration about elements that have been encountered in the document instance.
CONTENT element-qualifier* (IS | ISNT) (content-type | content-type-list)
"CONTENT IS" tests the declared content type of the element.
The content-type is one of the following:
A context-type-list is a list of content-types separated by OR or "|":
Syntax
( content-type ((OR | |) content-type)* )
If element-qualifiers are specified, the test applies to the qualified element, not the current element.
The following example shows one way of analyzing every element in a document. For the following input:
<!DOCTYPE a [ <!ELEMENT a - - (b1|b2|b3)+ -- content type: ELEMENT --> <!ELEMENT b1 - O EMPTY -- EMPTY --> <!ELEMENT b2 - O CDATA -- CDATA --> <!ELEMENT b3 - O (c|d|e|#PCDATA)* -- MIXED --> <!ELEMENT c - O RCDATA -- RCDATA --> <!ELEMENT d - O ANY -- ANY --> <!ELEMENT e - O (#PCDATA) -- MIXED (CONREF) --> <!ATTLIST e con CDATA #CONREF> ]> <a> <b1> <b2>Some text for b2</b2> <b3> <c>Some text for c</c> <d>Text for d <b1> <b2>b2 text inside d</b2> </d> <e>e without the conref <e con="e with the conref"> </a>
the following program prints an analysis:
DOWN-TRANSLATE ELEMENT #IMPLIED OUTPUT "%n<%q>: " DO WHEN CONTENT IS ELEMENT OUTPUT "element" ELSE WHEN CONTENT IS ANY OUTPUT "any" ELSE WHEN CONTENT IS MIXED OUTPUT "mixed" ELSE WHEN CONTENT IS CDATA OUTPUT "cdata" ELSE WHEN CONTENT IS RCDATA OUTPUT "rcdata" ELSE WHEN CONTENT IS EMPTY OUTPUT "empty" DONE OUTPUT " (conref)" WHEN CONTENT IS conref OUTPUT "%n" ; Finally, process the content: OUTPUT "%c"
Note that:
USEMAP element-qualifiers* (#NONE | #EMPTY | usemap-name) ( | (usemap-name | #NONE | #EMPTY))*
The usemap-name is the name of a USEMAP declaration in the DTD.
Elements which have had short reference maps associated with them by USEMAP declarations can be identified by the "USEMAP IS" test. For example:
OUTPUT "<!USEMAP #EMPTY>" UNLESS USEMAP IS (#NONE | #EMPTY)
The USEMAP test tests the short reference map associated with the currently opened element (or the element specified by the ancestry qualifier following the keyword USEMAP) for one of the map names specified in the test.
The test succeeds:
Knowing whether or not an element type has an associated USEMAP is important if the OmniMark program is creating an SGML document from an input SGML document: the USEMAP may affect the way short reference delimiters are interpreted. When "normalizing" a document, it is usually safest to issue a "<!USEMAP #EMPTY>" declaration at the start of any element with which a USEMAP declaration for a map other than #EMPTY has been associated in the DTD.
This test is for detecting the association of USEMAPs with elements in the DTD, not in the document instance.
If the element-qualifier specifies an element which does not exist, then the test fails.
Short reference map names are subject to NAMELEN and NAMECASE GENERAL.
The following syntactic variations are permitted:
In versions of OmniMark prior to V3, certain SGML enquiries fail rather than abort when their component enquires fail. For example, in the following, if there is no ANCESTOR chapter, then the test fails, and the "=" test is never done:
LOCAL COUNTER n ... DO WHEN OCCURRENCE OF ANCESTOR chapter = n ... DONE
Contrast this case with the following, in which the test is in error if there are fewer than seven items on the COUNTER shelf:
LOCAL COUNTER word-count LOCAL COUNTER n ... DO WHEN word-count @ 7 = n ... DONE
Or the following:
LOCAL COUNTER n ... DO WHEN n = OCCURRENCE OF ANCESTOR chapter ... DONE
which would both seem to be similar to the first case in their intended effect, but which are both in error if there is no ANCESTOR chapter.
OmniMark V3 standardizes these cases so that SGML enquiries of the first type are now also in error when the identified element does not exist. This new interpretation means that all of the following are equivalent in their behaviour:
WHEN OCCURRENCE OF ANCESTOR chapter = n WHEN n = OCCURRENCE OF ANCESTOR chapter WHEN (OCCURRENCE OF ANCESTOR chapter) = n
The name of the current element can be obtained using the format item "%q" everywhere except in an EXTERNAL-TEXT-ENTITY or EXTERNAL-DATA-ENTITY rule. In those rules, the "%q" format item yields the name of the entity being processed.
The names of other elements can be obtained using the operator "NAME OF".
% format-modifier* q
The "%q" format item refers to the currently opened element everywhere except in EXTERNAL-TEXT-ENTITY and EXTERNAL-DATA-ENTITY rules. In functions, even if the function is called from an EXTERNAL-TEXT-ENTITY or EXTERNAL-DATA-ENTITY rule, "%q" still refers to the currently opened element. This is to ensure that a function always behaves the same regardless of what rule it is called from.
The use of the "%q" format item in EXTERNAL-DATA-ENTITY or EXTERNAL-TEXT-ENTITY rules is described in Section 14.2.1, "Formatting Entity Names".
When referring to an element, the "%q" format can have the following modifiers:
The "l" modifier converts all of the text to lower-case.
The "l" modifier cannot be used with the "u" modifier.
The "u" modifier converts all of the text to upper-case.
The "u" modifier cannot be used with the "l" modifier.
The field-width modifier, "f" is allowed with the "%i" format. If the specified number is less than the minimum number of characters needed to format the value, the modifier is ignored. If it is greater, space characters are added to the right of the value to fill it out to the field width.
This is the right-justification modifier. It is allowed when the field-width modifier is given. It causes padding to be done on the left side of the field instead of the right.
The "k" modifier requires the "f" modifier.
The "NAME OF" operator can be used to provide the name of any opened element. It can be used in a number of different ways:
The element qualifier that follows "NAME OF" must identify a currently opened element. Otherwise, it is an error.
The form ""NAME OF DOCTYPE"", will, within a document instance, return the name of the document element (i.e. the topmost element). As described in Section 16.4.1, "The Public Identifier at the Start of the DTD", it is often of interest to know the name of the document element outside of the instance -- in particular, when processing the external identifier at the start of the DTD. For this purpose, the #DOCTYPE stream is provided. It contains the name of the document element, even outside of the document instance.
The #DOCTYPE stream is "attached" as soon as OmniMark encounters the document element name at the start of the DTD, following the DOCTYPE keyword. Prior to that point, the #DOCTYPE stream is "unattached". The "STREAM #DOCTYPE IS ATTACHED" test can be used to distinguish whether or not the document element name is available.
The #DOCTYPE stream is "read-only". Its value cannot be changed by an OmniMark program, nor can the #DOCYTPE "stream shelf" be cleared or added to (with NEW).
At any one point in processing an SGML document instance there are one or more elements open, starting with the document element. "CURRENT ELEMENTS" provides shelf-like access to all of these elements and their attributes.
Like a shelf, the "CURRENT ELEMENTS" stack is an ordered set of things (elements) with names.
It differs from a shelf in the following ways:
As a consequence of these differences between "CURRENT ELEMENTS" and other shelf-like things, different terminology is used. The relationship qualifiers are used to select an element instead of keys, the "NAME OF" operation is used instead of "KEY OF", and a variant of "NUMBER OF" is used instead of "ITEM OF".
"CURRENT ELEMENTS" makes a number of things easy to do:
In these applications, attribute values can contain the names of other elements and element qualifier-like conditions on transformations, which can readily be implemented by stepping over the "CURRENT ELEMENTS".
NUMBER OF CURRENT ELEMENTS element-qualifier*
The number of currently opened elements, including the currently opened element (if any), is available using "NUMBER OF CURRENT ELEMENTS":
ELEMENT #IMPLIED SET depth TO NUMBER OF CURRENT ELEMENTS OUTPUT "Element %q is nested %d(depth) deep.%n" OUTPUT "%c"
Similarly, the number of opened elements down to and including a specified opened element can be determined using "NUMBER OF CURRENT ELEMENTS" together with a element-qualifier. For example,
NUMBER OF CURRENT ELEMENTS OF ANCESTOR chapter
is the "depth" of the most recently opened (and still open) chapter element, starting with the document element as having depth 1 (one). "NUMBER OF CURRENT ELEMENTS OF DOCTYPE" is always 1 when there are any opened elements.
If "NUMBER OF CURRENT ELEMENTS" is followed by OF and an element-qualifier, then that qualifier must identify a currently opened element. On the other hand, "NUMBER OF CURRENT ELEMENTS" can always be used without a qualifier. If there are no opened elements (e.g. if the macro is used in a DOCUMENT-START or DOCUMENT-END rule), then "NUMBER OF CURRENT ELEMENTS" without a following element-qualifier has a value of zero.
MACRO where am I IS DO WHEN NUMBER OF CURRENT ELEMENTS = 0 OUTPUT "%"Where am I%" was called outside the document element.%n" ELSE OUTPUT "%"Where am I%" was called inside the document element.%n" DONE MACRO-END
REPEAT OVER REVERSED? CURRENT ELEMENTS element-qualifier* AS alias-name local-declaration* action* AGAIN
The opened elements can be iterated over and information about them accessed using a new form of the "REPEAT OVER" action. For example, the following action lists the currently opened elements, document element first, together with each of the element's non-implied attribute values (listed following each element name, indented by three spaces):
REPEAT OVER CURRENT ELEMENTS AS this-element OUTPUT NAME OF CURRENT ELEMENT this-element OUTPUT "%n" REPEAT OVER ATTRIBUTES OF CURRENT ELEMENT this-element AS this-attribute DO UNLESS ATTRIBUTE this-attribute IS IMPLIED OUTPUT " " OUTPUT KEY OF ATTRIBUTE this-attribute OUTPUT " = %"%v(this-attribute)%"%n" DONE AGAIN AGAIN
Within a "REPEAT OVER CURRENT ELEMENTS" action, "CURRENT ELEMENT" alias-name (as in "CURRENT ELEMENT" this-element) identifies the element selected by the current iteration. ELEMENT alone always identifies the currently opened element, and not the current item of the iteration.
"REPEAT OVER CURRENT ELEMENTS" must define an "alias" for the opened element selected on each iteration. The form "CURRENT ELEMENT" followed by the alias name is used to identify any reference to the selected element (see Section 14.1.4.3, "Element Alias Names"). (This technique must also be used when repeating over the attributes of an element.)
If the OmniMark program is to process the opened elements in most-recently-opened-first order, it can use the REVERSED option: "REPEAT OVER REVERSED CURRENT ELEMENTS".
The element (or elements, because a single "REPEAT OVER" can be applied to both "CURRENT ELEMENTS" and "REVERSED CURRENT ELEMENTS" simultaneously) selected by the current iteration is always identified by "CURRENT ELEMENT" element alias name in the head of the "REPEAT OVER" action.
REPEAT OVER CURRENT ELEMENTS OF ANCESTOR chapter
A "%q" format item, references to attributes, and element tests are not affected by a "REPEAT OVER CURRENT ELEMENTS" action. Within the loop, these apply to the same element as they would outside of it. In other words, a "%q" in such a "REPEAT OVER" action does not give the name of the element selected by the current iteration, but rather that of the most recently opened element.
The alias-name must be used with "REPEAT OVER CURRENT ELEMENTS". The alias-name is used to name the element selected by each iteration over the set of currently opened elements. "REPEAT OVER CURRENT ELEMENTS" is like "REPEAT OVER ATTRIBUTES" or "REPEAT OVER DATA-ATTRIBUTES" in requiring an alias-name.
An element alias-name can only be used as an "element name" following the keyword "CURRENT ELEMENT".
"CURRENT ELEMENT" alias-name can be used in a number of contexts within a "REPEAT OVER CURRENT ELEMENTS" loop:
OUTPUT ATTRIBUTE type OF CURRENT ELEMENT this-element
DO WHEN CURRENT ELEMENT this-element IS (chapter | annex) ...
OUTPUT NAME OF CURRENT ELEMENT this-element
SET N TO NUMBER OF CURRENT ELEMENTS OF CURRENT ELEMENT this-element
(Note that #ITEM can typically be used to get this information more concisely.)
It is possible to use a name as both an element alias-name and as a "real" element name. If such a name is used in any context other than immediately following the keyword "CURRENT ELEMENT", then it refers to the element with that name and not to the alias-name.
Element alias names are subject to the setting of the "GENERAL NAMECASE" declaration in the same way as all other element names, even though they are not, in a strict sense, SGML element names.
If a "REPEAT OVER" action in the input processor uses "CURRENT ELEMENTS" then all the text written to the SGML stream within that "REPEAT OVER" action is "buffered": none of the text is actually passed to OmniMark's built-in SGML parser until after the end of the "REPEAT OVER" action. So, for example, in the following (assuming it is performed in the input processor) all of the currently opened elements will be closed, but only after the end of the "REPEAT OVER":
REPEAT OVER REVERSED CURRENT ELEMENTS AS this-element OUTPUT "</" OUTPUT NAME OF CURRENT ELEMENT this-element OUTPUT ">" AGAIN
Similarly, if a "REPEAT OVER" action in the input processor iterates over a set of attributes or over the tokens of an attribute, or if an attribute or attribute token is identified by a USING prefix (ATTRIBUTE, ATTRIBUTES, DATA-ATTRIBUTES or "USING ATTRIBUTE") then all text written to the #SGML stream within that "REPEAT OVER" action is "buffered" in the same manner as for "CURRENT ELEMENTS".
A "USING ATTRIBUTE" or "USING ATTRIBUTES" prefix will also cause input to the SGML parser to be buffered until the action to which it applies completes.
Buffering #SGML stream text ensures that the current elements don't change in the middle of a "REPEAT OVER CURRENT ELEMENTS".
Entities are an important component in SGML document instances. They are used for a number of different purposes:
OmniMark processes different kinds of entities in different ways:
In general, internal entities which require application-specific processing should be encoded as SDATA entities.
A programmer can provide alternative behaviour by specifying an EXTERNAL-TEXT-ENTITY rule to handle external text entities explicitly.
This section documents the operations that can be applied to entity references:
In EXTERNAL-DATA-ENTITY and EXTERNAL-TEXT-ENTITY rules, the "%q" format modifier can be used to return the name of the entity currently being processed.
% format-modifier* q
The "%q" format item only refers to the current entity in the actions of an EXTERNAL-DATA-ENTITY or EXTERNAL-TEXT-ENTITY rule. It refers to the current element everywhere else, including functions which are called from the EXTERNAL-DATA-ENTITY or EXTERNAL-TEXT-ENTITY rule.
When referring to an entity, the "%q" format can have the following modifiers:
The "l" modifier converts all of the text to lower-case.
The "l" modifier cannot be used with the "u" modifier.
The "u" modifier converts all of the text to upper-case.
The "u" modifier cannot be used with the "l" modifier.
The field-width modifier, "f" is allowed with the "%i" format. If the specified number is less than the minimum number of characters needed to format the value, the modifier is ignored. If it is greater, space characters are added to the right of the value to fill it out to the field width.
This is the right-justification modifier. It is allowed when the field-width modifier is given. It causes padding to be done on the left side of the field instead of the right.
The "k" modifier requires the "f" modifier.
The "%q" format item also has modifiers that cause it to return other information about the current entity:
This modifier can only be used in an EXTERNAL-DATA-ENTITY rule because external text entities do not have a notation.
This is the only format modifier of this set that can be combined with the "f" or "k" format modifiers described above.
These modifiers can be combined as follows:
This combination can only be used in an EXTERNAL-DATA-ENTITY rule because external text entities do not have notations.
This combination can only be used in an EXTERNAL-DATA-ENTITY rule because external text entities do not have notations.
This combination can only be used in an EXTERNAL-DATA-ENTITY rule because external text entities do not have notations.
If an entity has no system identifier, then the "e" format modifier acts like ep.
If an entity has no public identifier, or if the program has no LIBRARY rule to associate a system identifier with the entity's public identifier, then it is an error to use ep format modifier combination. If such an entity also does not declare a system identifier in the entity declaration, then it is also an error to use the "e" format modifier alone.
The same observation applies to the system identifier of the entity's notation when using the above format modifiers in combination with the "o" format modifier.
All of the above combinations may be further combined with the "l" or "u" format modifiers. Additionally, the "o" format modifier can also be combined with the "f" and the "k" format modifiers, provided it is not also combined with the "e" or "p" modifiers.
The "f" and the "k" format modifiers can only be used with entity names and notation names.
Several tests can be applied to entities.
The entity tests are:
ENTITY (IS | ISNT) INTERNAL
The "IS INTERNAL" test succeeds if the entity is an internal entity.
The "IS INTERNAL" test is primarily useful when testing the entities named by the values of ENTITY or ENTITIES attributes. (See Section 14.4.3.3, "Entity and Notation Attribute Tests".) This test will always be false in an EXTERNAL-TEXT-ENTITY or EXTERNAL-DATA-ENTITY rule.
ENTITY (IS | ISNT) EXTERNAL
"IS EXTERNAL" succeeds only if the entity is an external entity.
The "IS INTERNAL" test is primarily useful when testing the entities named by the values of ENTITY or ENTITIES attributes. (See Section 14.4.3.3, "Entity and Notation Attribute Tests".) This test will always be true in an EXTERNAL-TEXT-ENTITY or EXTERNAL-DATA-ENTITY rule.
ENTITY (IS | ISNT) PUBLIC
"IS PUBLIC" succeeds only if the entity is an external entity and was declared with a public identifier.
ENTITY (IS | ISNT) SYSTEM
"IS SYSTEM" succeeds only if the entity is an external entity and was declared with a system identifier.
ENTITY (IS | ISNT) IN-LIBRARY
The LIBRARY declaration explained in Section 19.1.4.1, "Mapping Public Ids To System Ids" associates system identifiers (usually file names) with public identifiers. The IN-LIBRARY test is used to determine whether the program contains LIBRARY rules for the specified entity.
"IS IN-LIBRARY" succeeds only if entity is an external entity with a public identifier that is mapped to a system identifier in an OmniMark LIBRARY declaration.
ENTITY (IS | ISNT) CDATA-ENTITY
The "IS CDATA-ENTITY" test succeeds if the entity is an external or internal CDATA entity.
ENTITY (IS | ISNT) SDATA-ENTITY
The "IS SDATA-ENTITY" test succeeds if the entity is an external or internal SDATA entity.
ENTITY (IS | ISNT) NDATA-ENTITY
The "IS NDATA-ENTITY" test succeeds if the entity is an external NDATA entity.
ENTITY (IS | ISNT) SUBDOC-ENTITY
The "IS SUBDOC-ENTITY" test succeeds if the entity is an external subdocument entity.
ENTITY (IS | ISNT) DEFAULT-ENTITY
An SGML DTD can contain a declaration for the default general entity. For example:
<!ENTITY #DEFAULT SYSTEM "default.txt">
Any general entity reference that contains a name that was not defined as a general entity in the DTD is "rerouted" to the default general entity. For the example above, any reference to an undefined entity would get an external text entity with a system identifier of "default.txt". The entity names in ENTITY and ENTITIES attribute values are also satisfied using the default general entity.
If there is no default general entity, a general entity reference containing an undefined name is an error. An undefined parameter entity is always an error: there is no such thing as the "default parameter entity".
The default general entity can be any type of general entity: internal or external, text or non-SGML. It is the job of the DTD designer and the creators of SGML documents to ensure that whenever an undefined general entity is referenced or its name is used in an ENTITY or ENTITIES attribute value, the default general entity is of the appropriate type. For example, if an undefined entity name is used as an entity value, the default general entity had better be a non-SGML entity (CDATA, SDATA, NDATA or SUBDOC) and not a text entity, or an error will result.
The OmniMark program can determine that the default general entity is used by using the "ENTITY IS DEFAULT-ENTITY" test. The OmniMark program may provide some special processing for entities resolved using the default general entity, or may just list undefined entities.
The following example will always output the text "[DEFAULT]" for references to an undefined general entity, independently of any public or system identifier the default general entity may have, so long as default general entity is an external text entity. In addition, it displays a message on the error output indicating which undefined entities are referenced in the document:
EXTERNAL-TEXT-ENTITY #IMPLIED WHEN ENTITY IS DEFAULT-ENTITY OUTPUT "[DEFAULT]" PUT #ERROR "General entity %q is undefined.%n"
It is an error if the referenced entity was not declared.
External text entities can be either general entities (introduced by "&") or parameter entities (by "%"). EXTERNAL-TEXT-ENTITY rules are processed for both general and parameter external text entities, so the OmniMark programmer has to be ready to handle both kinds of entities.
If an entity has a system identifier or a public identifier (or both) it usually doesn't much matter whether it is a general entity or parameter entity; the rules for how to find the information referenced by the system identifier are usually the same.
However, if only an entity name is provided, as in the following two declarations, and the OmniMark program is written to use the entity name as part of a file name, for example, the program may wish to do so differently for general and parameter entities.
<!ENTITY chapter1 SYSTEM -- text of the first chapter --> <!ENTITY % comdcl SYSTEM -- common declarations -->
There are two tests that can be used in an EXTERNAL-TEXT-ENTITY rule that allow for distinguishing between general and parameter entities:
ENTITY IS GENERAL ENTITY IS PARAMETER
One or the other of these two tests is always true of an entity in an EXTERNAL-TEXT-ENTITY rule, so "ENTITY IS GENERAL" is equivalent to "ENTITY ISNT PARAMETER" and "ENTITY IS PARAMETER" is equivalent to "ENTITY ISNT GENERAL".
These tests can also be used in the EXTERNAL-DATA-ENTITY rule, or with ENTITY or ENTITIES attribute values. However, in both these cases, the entity is always a general entity.
An entity manager designer should note that, in general, parameter entity references usually occur in the DTD and general entity references usually occur in the document instance. There are, however, two exceptions to this general rule:
In summary, both general and parameter external entity references can occur in both the DTD and the document instance.
The test:
ENTITY (IS | ISNT) GENERAL
returns TRUE if the external text entity being processed is a general entity.
The test:
ENTITY (IS | ISNT) PARAMETER
returns TRUE if the external text entity being processed is a parameter entity.
The above entity tests can be combined:
EXTERNAL-TEXT-ENTITY #IMPLIED OUTPUT FILE "%eq" WHEN ENTITY IS (SYSTEM & EXTERNAL)
This example outputs the file identified by the entity's system identifier, but only if the entity is an external entity and has a system identifier.
Notation references have the syntax:
NOTATION
Different aspects of the notation of the current external data entity being processed can be queried using the keyword NOTATION.
Several tests can be applied to notations.
The notation tests are:
NOTATION (IS | ISNT) PUBLIC
"IS PUBLIC" succeeds only if the notation was declared with a public identifier.
NOTATION (IS | ISNT) SYSTEM
"IS SYSTEM" succeeds only if the notation was declared with a system identifier.
NOTATION (IS | ISNT) IN-LIBRARY
The LIBRARY declaration explained in Section 19.1.4.1, "Mapping Public Ids To System Ids" associates system identifiers (usually file names) with public identifiers. The IN-LIBRARY test is used to determine whether the program contains LIBRARY rules for the system identifier of the specified notation.
"IS IN-LIBRARY" succeeds only if the notation has a public identifier that is mapped to a system identifier in an OmniMark LIBRARY declaration.
The above notation tests can be combined:
EXTERNAL-DATA-ENTITY #IMPLIED OUTPUT "%eoq" WHEN NOTATION IS (SYSTEM | IN-LIBRARY)
This example outputs the name of the file identified by the notation's system identifier, but only if the notation is has a system identifier or has a public identifier mapped to a system identifier in a LIBRARY declaration.
The tests can be combined by either AND (or "&") or OR (or "|").
NOTATION (=|!=) (notation-name | notation-name-list)
The "=" test for a NOTATION tests if its name is one of those given as the notation-name or in the notation-name-list. The notation-names must be constant quoted strings or OmniMark names.
LOCAL STREAM standard-prefix ... DO WHEN NOTATION = giff ... DONE
Several OmniMark features allow the programmer to manipulate the attribute values that are a central aspect of the SGML language. There are two kinds of attributes:
OmniMark generally uses DATA-ATTRIBUTE to refer to data attributes, and ATTRIBUTE to refer to element attributes. The exceptions to this are:
In those contexts, either ATTRIBUTE or DATA-ATTRIBUTE can be used.
Attribute references have the syntax:
Syntax
ATTRIBUTE attribute-name element-qualifier* ((ITEM | @) numeric-expression)?
Unlike programmer-defined data types, attribute references always require the ATTRIBUTE herald. This is because attributes are defined in the SGML document and not in the OmniMark program.
Attribute references are always treated as string expressions, even if the attribute was declared in the SGML document to be of type NUMBER. However, string expressions which contain a valid representation of a decimal number can be used anywhere where a numeric expression is permitted, so this interpretation places no restriction on the use of attributes.
Attributes can always be further identified by following the attribute name with an element-qualifier. Element-qualifiers are described in Section 14.1.1, "Element Qualifiers" and further clarified in Section 14.1.4.3, "Element Alias Names". For example:
ATTRIBUTE date OF ANCESTOR change
refers to the date attribute of an enclosing element whose element name is change.
Within the EXTERNAL-DATA-ENTITY rule (described in Section 16.2.1, "Processing External Data and Subdocument Entities"), unqualified attributes are data attributes; in other contexts, they are attributes of the current element.
The ITEM (or "@") indexer can be used if the attribute was declared as a list-valued attribute. (See Section 14.4.2, "List-Valued Attributes".) However, unlike shelves, for which the right-most value is selected unless indication is made otherwise, attributes do not have a "default" selected value: the whole attribute value is tested or output as a single unit if no index is specified.
ATTRIBUTE references provide the attribute value unmodified, except for:
In particular, TRANSLATE rule processing is not performed on the value of an attribute when it is referenced using the ATTRIBUTE herald. (This contrasts with references to an attribute value using the "%v" modifier. See Section 14.4.4, "Attribute Format Items".)
Attempting to use an ATTRIBUTE value in a string expression will cause an error:
When using element-qualifiers in an attribute reference, the programmer should be aware that:
ATTRIBUTE indent OF ANCESTOR (numlist | bullist | deflist)
refers to the indent attribute of a containing numlist, bullist, or deflist, whichever comes first. It would be an error if element numlist came first but did not have a value for attribute indent.
In the header and body of an EXTERNAL-DATA-ENTITY rule, all unqualified references to attributes actually refer to data attributes of the external entity being processed. In all other rules, they refer to element attributes (attributes of element start-tags). See Section 14.4.3.4, "Data Attributes Associated With Entity Attributes" for an explanation of how the DATA-ATTRIBUTE keyword can be used with qualifiers.
Unqualified references to attributes inside functions always refer to element attributes. In order to refer to the data attributes of an external entity being processed, the qualifier "OF ENTITY" must be specified.
Element qualifiers can themselves be qualified. Thus, constructs such as the following are permissible:
ATTRIBUTE type OF PARENT OF ANCESTOR listitem
This last example refers to an attribute of the parent (presumably a list element) of an ancestor called listitem.
USING ATTRIBUTE attribute-name element-qualifier* ((ITEM | @) numeric-expression)?
The USING prefix allows attributes to be referenced without repeating the element-qualifiers or the ITEM (or "@") indexer. It is also useful when using the "%v" format item on an attribute which does not belong to the current element.
In the action in the following rule, the qualifier "OF PARENT" is implied when the attribute chapno is named:
ELEMENT section USING ATTRIBUTE chapno OF PARENT OUTPUT "%v(chapno).%d(sectno). %c%n"
Either the element-qualifier or the ITEM (or "@") indexer must be specified.
A list-valued attribute is one whose declaration is one of:
When the attribute is a list-valued attribute, a particular item in the list can be accessed with an ITEM (or "@") phrase. For example:
ATTRIBUTE col-w @ 3
refers to the third item in the list of values specified for attribute col-w.
The selector must not be greater than the number of items in the attribute value. The number of items in a list-valued attribute can be determined by using the "NUMBER OF" operator described in Section 7.4.1, "Determining the Size of a Shelf".
Determining the system or public identifier or notation of an entity name used in an ENTITIES attribute requires the use of the ITEM (or "@") phrase.
List-valued attributes are different from shelves in the following ways:
NUMBER OF ATTRIBUTE attribute-name element-qualifier*
"NUMBER OF ATTRIBUTE" returns the number of tokens in a list-valued attribute. When the attribute does not have a list-valued type, "NUMBER OF ATTRIBUTE" will always yield the value one (1).
REPEAT OVER ATTRIBUTE attribute-name element-qualifier* local-declaration* action* AGAIN
"REPEAT OVER ATTRIBUTE" iterates over the values in a list-valued attribute.
One or more list-valued attributes and/or shelves can be combined in a single "REPEAT OVER" when they each have the same number of values:
... ELEMENT e LOCAL COUNTER attribute-value-length VARIABLE ... REPEAT OVER attribute-value-length & ATTRIBUTE multi SET attribute-value-length TO LENGTH OF ATTRIBUTE multi AGAIN
The previous example initializes a COUNTER shelf with the lengths of the corresponding attribute values.
There are three operators for testing where the value of an attribute is set:
Unlike all other attribute references, it is not an error if the specified attribute does not exist, or was not given a value.
If the element-qualifier references an element that does not exist, or if the specified attribute is not declared, then the IS form of the test always fails, and the ISNT form always succeeds.
Since the test does not actually use the value, it is not an error for the value to be unset.
The test:
ATTRIBUTE attribute-name element-qualifier* ((ITEM | @) numeric-expression)? (IS|ISNT) SPECIFIED
succeeds when the referenced:
The "IS SPECIFIED" attribute test fails otherwise. Using ISNT instead of IS reverses the result.
The test:
ATTRIBUTE attribute-name element-qualifier* ((ITEM | @) numeric-expression)? (IS|ISNT) DEFAULTED
succeeds when the referenced:
The "IS DEFAULTED" attribute test fails otherwise. Using ISNT instead of IS reverses the result.
The test:
ATTRIBUTE attribute-name element-qualifier* ((ITEM | @) numeric-expression)? (IS|ISNT) IMPLIED
succeeds when the referenced:
The "IS IMPLIED" attribute test fails otherwise. Using ISNT instead of IS reverses the result.
The type of an element attribute or data attribute can be tested for. An attribute can be declared with one of the following types:
or it can be declared as a name token group. (A name token group attribute can only have one of the values specified in the parenthesized list of names.)
If the specified attribute was not declared for the qualified element (or entity), or if the qualified element does not exist, then an error message is printed, and OmniMark halts. The error message can be avoided by using the "IS SPECIFIED", "IS DEFAULTED", or "IS IMPLIED" tests.
The type of the attribute can be tested for with the following operators:
ATTRIBUTE attribute-name element-qualifier* ((ITEM | @) numeric-expression)? (IS | ISNT) (CDATA | NAME | NAMES | NUMBER | NUMBERS | NMTOKEN | NMTOKENS | NUTOKEN | NUTOKENS | ID | IDREF | IDREFS | NOTATION | ENTITY | ENTITIES | GROUP)
This test succeeds if the attribute was declared with the given type. The "IS GROUP" test succeeds if the given attribute was declared to take its values from a name token group.
Attribute type tests can be combined by separating the types with OR or "|" and parenthesing them. For example, the following two example is legal:
GLOBAL COUNTER id-uses ... DO WHEN ATTRIBUTE id IS (IDREF | IDREFS) REPEAT OVER ATTRIBUTE id DO WHEN id-uses HAS KEY ATTRIBUTE id INCREMENT id-uses ^ ATTRIBUTE id ELSE SET NEW id-uses ^ ATTRIBUTE id TO 1 DONE AGAIN DONE
Note that, for a data attribute, a test of its type for ID, IDREF, IDREFS, NOTATION, ENTITY or ENTITIES will always fail, because those types of attributes cannot be associated with a notation.
The notation and entity tests (described in Section 14.3, "Notations" and Section 14.2, "Entities") can be applied directly to attribute values which are declared as ENTITY or NOTATION, or directly to an item of an attribute value declared as ENTITIES.
... DO WHEN ATTRIBUTE IS (EXTERNAL & IN-LIBRARY) ... DONE
It is usually wise to test that the attribute value is an ENTITY, ENTITIES, or NOTATION attribute before applying the entity or notation tests.
In EXTERNAL-DATA-ENTITY rules, the attributes of the entity that triggered the rule are referred to using the ATTRIBUTE herald.
However, entities named in ENTITY and ENTITIES attributes may also have data attributes. To differentiate these attributes from the attributes that belong to the current element or external entity being processed, the data attributes that belong to an ENTITY or ENTITIES attribute item are heralded with the keyword DATA-ATTRIBUTE.
The keyword DATA-ATTRIBUTE can be used in the same way that ATTRIBUTE is used elsewhere. The data attribute name must always be followed by OF and, in parentheses, the identification of the attribute value or item containing the entity name. The syntax is:
Syntax
DATA-ATTRIBUTE data-attribute-name OF ( ATTRIBUTE attribute-name element-qualifier* ((ITEM | @) numeric-expression)? ) ((ITEM | @) numeric-expression)?
The first optional ITEM (or "@") indexer (inside the parentheses) is associated with the attribute of the qualified element or the external entity currently being processed. It is this attribute which must be declared as ENTITY or ENTITIES.
The second optional ITEM (or "@") indexer (outside the parentheses) is associated with the data attribute named in the parenthesized attribute value item.
The parentheses eliminate the confusion between the two ITEM (or "@") indexers.
For example:
OUTPUT DATA-ATTRIBUTE widths OF (ATTRIBUTE name) @ 1 OUTPUT "%v(widths)"
finds the attribute named name in the current element, verifies that it is of type ENTITY, finds the data attribute widths associated with it, and prints out its first item.
Note that the ATTRIBUTE keyword is used in EXTERNAL-DATA-ENTITY rules to refer to a data attribute of the current entity. The DATA-ATTRIBUTE keyword is only used to refer to the data attributes of an ENTITY or ENTITIES attribute value.
For simplicity from this point on, the syntax of the DATA-ATTRIBUTE will be given as:
Syntax
DATA-ATTRIBUTE data-attribute-name OF ( attribute-reference ) ((ITEM | @) numeric-expression)?
where attribute-reference is understood to mean an ATTRIBUTE reference with optional element-qualifiers and an optional ITEM (or "@") indexer.
DATA-ATTRIBUTE references can be used in any context that permits ATTRIBUTE references.
For example, if tableref is an element that references external entities containing tables, and name is an ENTITY attribute giving the name of the external entity, then the following will compute the number of columns in a table from the number of column widths entered in a list-valued attribute:
ELEMENT tableref LOCAL COUNTER column-count ... SET column-count TO NUMBER OF DATA-ATTRIBUTE colwidth OF (ATTRIBUTE name)
Note that DATA-ATTRIBUTE references are also permitted inside the parentheses of other DATA-ATTRIBUTE references, allowing as many levels of indirection as necessary:
ELEMENT tableref LOCAL COUNTER column-count ... SET column-count TO NUMBER OF DATA-ATTRIBUTE colwidth OF (DATA-ATTRIBUTE table OF (ATRIBUTE name))
REPEAT OVER DATA-ATTRIBUTE attribute-name OF ( attribute-reference ) local-declaration* action* AGAIN
The "REPEAT OVER" action can be applied to DATA-ATTRIBUTE references in exactly the same way as ATTRIBUTE references.
Inside the "REPEAT OVER", the data attribute being iterated over can either be referred to using the DATA-ATTRIBUTE herald, or the ATTRIBUTE herald.
For instance, the following are equivalent:
Example A
REPEAT OVER DATA-ATTRIBUTE col-width OF (ATTRIBUTE table-ref) OUTPUT DATA-ATTRIBUTE col-width AGAIN
Example B
REPEAT OVER DATA-ATTRIBUTE col-width OF (ATTRIBUTE table-ref) OUTPUT ATTRIBUTE col-width AGAIN
Example C
REPEAT OVER DATA-ATTRIBUTE col-width OF (ATTRIBUTE table-ref) OUTPUT "%zv(col-width)" AGAIN
The "z" format modifier is used to turn off TRANSLATE rules to make the "%v" format item behave exactly like the ATTRIBUTE reference. (See Section 14.4.4, "Attribute Format Items".)
USING DATA-ATTRIBUTE attribute-name OF ( attribute-reference ) ((ITEM | @) numeric-expression)?
The USING prefix can also be applied to DATA-ATTRIBUTE references in exactly the same way as ATTRIBUTE references. In the action within the USING, the keyword ATTRIBUTE can be used to indicate the selected attribute instead of the keyword DATA-ATTRIBUTE.
The following examples are equivalent:
Example A
OUTPUT DATA-ATTRIBUTE size OF (ATTRIBUTE id OF PARENT)
Example B
USING DATA-ATTRIBUTE size OF (ATTRIBUTE id OF PARENT) OUTPUT DATA-ATTRIBUTE size
Example C
USING DATA-ATTRIBUTE size OF (ATTRIBUTE id OF PARENT) OUTPUT ATTRIBUTE size
Example D
USING DATA-ATTRIBUTE size OF (ATTRIBUTE id OF PARENT) OUTPUT "%zv(size)"
The "z" format modifier is used to turn off TRANSLATE rules to make the "%v" format item behave exactly like the ATTRIBUTE reference. (See Section 14.4.4, "Attribute Format Items".)
This section describes the "%v" format item used to format attribute values. It also describes the different format modifiers that are available for different types of attribute values.
The same principles that apply to the "%q" format item also apply to the "%v" format item. It refers to the attributes of the currently opened element, except when used directly in the body of an EXTERNAL-DATA-ENTITY rule. There it refers to the entity's data attributes. Unlike "%q", "%v" in an EXTERNAL-TEXT-ENTITY rule refers to the currently opened element's attributes -- there are no such things as "text" attributes.
The USING prefix is helpful when an attribute of other than the current element or entity reference needs to be manipulated or displayed. It can also be used to select a particular token of a list-valued attribute.
% format-modifier* v( attribute-name )
In ELEMENT rules, the named attribute must be an attribute of the element; in EXTERNAL-DATA-ENTITY rules it must be a data attribute of the entity being processed. In all other rules the named attribute must be an attribute of the containing element.
The following modifiers can always be used with the "%v" format.
Forces letters in the attribute value to lower-case.
Forces letters in the attribute value to upper-case.
The field-width modifier is allowed with the "%v" format (although it is ignored for CDATA attributes). If number is less than the minimum number of characters needed to print the attribute value, it is ignored. If it is greater, space characters are added to the right of the value to fill it out to the field width.
Allowed only with the field-width modifier, this puts spaces to the left of the value instead of the right.
If the attribute has a CDATA declared type the following modifiers can also be used:
Prevents insertion of line breaks (see Section 19.1.5, "Line Breaking").
Minimizes white space in the processed content as follows:
Prevents selection of any TRANSLATE rule that would otherwise apply to all or part of the attribute value (see Section 4.2.2.2, "Translating Patterns in Data Content").
If the attribute's declared type is ENTITY or ENTITIES, and the entity name refers to an external entity, the following modifiers can also be specified:
Causes OmniMark to access the system identifier from the entity declaration instead of accessing the entity name. The "f", "k", "l", and "u" format modifiers cannot be used with this combination.
Causes OmniMark to access the notation name from the entity declaration instead of the entity name.
Causes OmniMark to access the public identifier from the entity declaration instead of the entity name. The "f", "k", "l", and "u" format modifiers cannot be used with this combination.
These modifiers can be combined as follows:
Causes OmniMark to access the system identifier found by searching for the entity's public identifier in the LIBRARY rules. The "f", "k", "l", and "u" format modifiers cannot be used with this combination.
Causes OmniMark to access the system identifier declared for the notation associated with the entity. The "f", "k", "l", and "u" format modifiers cannot be used with this combination.
Causes OmniMark to access the public identifier declared for the notation associated with the entity. The "f", "k", "l", and "u" format modifiers cannot be used with this combination.
Causes OmniMark to access the system identifier found by searching for the entity's notation's public identifier in the LIBRARY rules. The "f", "k", "l", and "u" format modifiers cannot be used with this combination.
If an entity has no system identifier, then "e" acts like ep. It is an error if either "e" or ep is used and the entity has no system identifier or no public identifier bound by a LIBRARY rule to a system identifier.
This format accesses letters within system and public identifiers in upper-case or lower-case as they appear in the entity declaration. Letters in element, entity, or notation names appear in upper-case or lower-case as they appear in the processed document unless the SGML Declaration specifies upper-case substitution for that class of name. If so, the name is accessed with letters forced to upper-case. Thus, in the Reference Concrete Syntax, by default, element and notation names appear in upper-case while entity names appear as entered in the document.
Only the "o" format modifier can be combined with the "f", "k", "u", or "l" format modifiers.
For an ENTITIES attribute, if the attribute value contains more than one entity name, the USING prefix, described in Section 14.4.1.1.1, "The Using Prefix and Attribute Values", must be used to select one entity whose system or public identifier is to be manipulated or displayed.
If the value of an ENTITY or ENTITIES attribute is the name of an internal CDATA or SDATA entity then the "%ev" format can be used to determine the replacement text of the internal entity. (The only kinds of internal entities that can be used in an ENTITY or ENTITIES attribute are CDATA or SDATA.) See the example below for the different handling of external and internal entities.
In the following example, the element as-is has a single required ENTITY attribute text. The entity named by the attribute value simply provides the text that is to replace the element, wherever it occurs in a document.
<!ELEMENT as-is - O EMPTY> <!ATTLIST as-is text ENTITY #REQUIRED>
The following ELEMENT rule for processing the as-is element does the following:
Example
ELEMENT as-is DO WHEN ATTRIBUTE text IS ENTITY DO WHEN ATTRIBUTE text IS EXTERNAL OUTPUT FILE "%ev(text)" ELSE OUTPUT "%ev(text)" DONE DONE SUPPRESS
Note that "%ev" returns one of two things, depending on whether the entity named by the attribute to which it is applied is INTERNAL or EXTERNAL:
The EXTERNAL test, and other tests that can be used for attributes like text are described in Section 14.4.3.3, "Entity and Notation Attribute Tests". These tests allow the OmniMark program not only to distinguish between internal and external entities, but also whether an attribute is an ENTITY or ENTITIES attribute in the first place, and whether an internal or external entity is CDATA or SDATA.
Some of the format modifiers available for ENTITY or ENTITIES attributes are also available for NOTATION attributes. Specifically, the following modifiers can be specified:
Causes OmniMark to access the system identifier from the notation declaration instead of the notation name. The "f", "k", "l", and "u" format modifiers cannot be used with this combination.
Causes OmniMark to access the public identifier from the notation declaration instead of the notation name. The "f", "k", "l", and "u" format modifiers cannot be used with this combination.
Causes OmniMark to access the system identifier found by searching for the notation's public identifier in the LIBRARY rules. The "f", "k", "l", and "u" format modifiers cannot be used with this combination.
If a notation has no system identifier, then "e" acts like ep. It is an error if either "e" or ep is used and the notation has no system identifier or no public identifier bound by a LIBRARY rule to a system identifier.
These formats access letters within system and public identifiers in upper-case or lower-case as they appear in the notation declaration. Letters in element, notation, or notation names appear in upper-case or lower-case as they appear in the processed document unless the SGML Declaration specifies upper-case substitution for that class of name. If so, the name is accessed with letters forced to upper-case. Thus, in the Reference Concrete Syntax, by default, element and notation names appear in upper-case while notation names appear as entered in the document.
None of them can be used with the "f", "k", "l", or "u" format modifiers.
There are many types of applications where it is undesirable to reference each attribute by name. Some examples are:
For such applications, OmniMark provides the ATTRIBUTES object which allows all the declared attributes or specified attributes of an element or external entity to be treated as a shelf in the following way:
It differs from other shelves in:
References to an ATTRIBUTES shelf have the syntax:
Syntax
SPECIFIED? ATTRIBUTES element-qualifier*
References to a DATA-ATTRIBUTES shelf have the syntax:
Syntax
SPECIFIED? DATA-ATTRIBUTES (OF ( attribute-reference ))?
the shelf contains all of the declared attributes
the shelf contains all of the attributes specified in the element's start tag or in the external entity's declaration
The utility of the ATTRIBUTES shelf can be seen by the following OmniMark code fragment, which outputs "normalized" start and end tags around the content of the current element, with all specified attribute values included:
ELEMENT #IMPLIED OUTPUT "<%q" REPEAT OVER SPECIFIED ATTRIBUTES AS this-attribute OUTPUT " " || KEY OF ATTRIBUTE this-attribute || "=%"%v(this-attribute)%"" AGAIN OUTPUT ">" _ "%c" OUTPUT "</%q>" WHEN CONTENT ISNT (EMPTY | CONREF)
This example can be used as a complete but simple OmniMark program that "normalizes" an SGML document. In practise, such a program will also need to:
A complete normalizer is distributed with OmniMark.
Attributes can be accessed using KEY (or "^") and ITEM (or "@") indexers:
Syntax
SPECIFIED? ATTRIBUTES (KEY | ^) string-expression
Syntax
SPECIFIED? ATTRIBUTES (ITEM | @) numeric-expression
The key value used to index should always be in upper-case when NAMECASE GENERAL YES applies to the SGML document being processed (e.g. "IDENT" above). Unlike names in OmniMark programs, string values used as keys are not automatically upper-cased -- doing so is the OmniMark programmer's responsibility. Upper-casing can be done by directly entering the appropriate values as above or by using the ""u"" format modifier.
For example, the following actions output the same value:
Example A
OUTPUT ATTRIBUTE ident
Example B
OUTPUT ATTRIBUTES ^ "IDENT"
Example C
LOCAL STREAM attribute-name SET attribute-name TO "ident" OUTPUT ATTRIBUTES ^ "%ug(attribute-name)"
The ATTRIBUTES shelf differs from other shelves in not having a "current item". So the following is always invalid:
OUTPUT ATTRIBUTES
In addition, even though some of the items of the ATTRIBUTES shelf may be list-valued, "double indexing" is not allowed. The following is invalid:
OUTPUT ATTRIBUTES ^ "IDENT" @ 1
The "double indexing" can be accomplished with a USING prefix:
USING ATTRIBUTES ^ "IDENT" AS ident OUTPUT ATTRIBUTE ident @ 1
The "USING ATTRIBUTES" prefix (Section 14.4.5.3, "Selecting an Item of the ATTRIBUTES and DATA-ATTRIBUTES Shelf") can be used to select an attribute from a set so that a token can be selected from the attribute value:
USING ATTRIBUTES ^ "IDENT" AS ident OUTPUT ATTRIBUTE ident @ 1
DATA-ATTRIBUTES shelves can be accessed in the same way:
Syntax
SPECIFIED? DATA-ATTRIBUTES (OF ( attribute-reference ))? (KEY | ^) string-expression
Syntax
SPECIFIED? DATA-ATTRIBUTES (OF ( attribute-reference ))? (ITEM | @) numeric-expression
An item of an ATTRIBUTES or DATA-ATTRIBUTES shelf can also be used in the parenthesized portion of a DATA-ATTRIBUTES or DATA-ATTRIBUTE reference:
Example A
... OUTPUT DATA-ATTRIBUTE OF (ATTRIBUTES ^ "REF")
Example B
... OUTPUT DATA-ATTRIBUTE OF (DATA-ATTRIBUTES OF (ATTRIBUTES ^ "REF") @ 1)
SPECIFIED? ATTRIBUTES HAS KEY string-expression
"HAS KEY" can be applied to the attribute shelf to determine whether an element has an attribute declared for it. Note that just because an attribute is declared does not mean that it has a value:
GLOBAL STREAM id-name ... OUTPUT ATTRIBUTES ^ id-name WHEN ATTRIBUTES HAS KEY id-name & ATTRIBUTES ^ id-name ISNT IMPLIED
The "HAS KEY" operator can be very useful when combined with tests that determine the declared type of an attribute. (Remember that those tests cause an error when the attribute that is being referenced was not declared for the specified element.) The following example normalizes an SGML document and adds id attributes to elements that can have them, depending on their type.
DOWN-TRANSLATE GLOBAL COUNTER id-count ELEMENT #IMPLIED OUTPUT "<%q" DO WHEN ATTRIBUTES HAS KEY 'ID' DO WHEN ATTRIBUTE id IS CDATA OUTPUT " id='%q/%d(id-count)'" ELSE OUTPUT " id='%d(id-count)'" DONE INCREMENT id-count DONE OUTPUT ">%c" OUTPUT "</%q>" WHEN ELEMENT ISNT #EMPTY
The "HAS KEY" test is also useful in conjunction with attribute tests when the attribute name is known. The following two tests are equivalent:
Example A
OUTPUT ATTRIBUTE ident WHEN ATTRIBUTES HAS KEY "IDENT" & ATTRIBUTE ident ISNT IMPLIED
Example B
OUTPUT ATTRIBUTE ident WHEN ATTRIBUTE ident IS SPECIFIED | ATTRIBUTE ident IS DEFAULTED
"HAS KEY" can be used on the DATA-ATTRIBUTES shelf as well:
Syntax
SPECIFIED? DATA-ATTRIBUTES (OF ( attribute-reference ))? HAS KEY string-expression
KEY OF SPECIFIED? ATTRIBUTES element-qualifier* @ numeric-expression
KEY OF SPECIFIED? DATA-ATTRIBUTES (OF ( attribute-reference ))? @ numeric-expression
KEY OF ATTRIBUTE attribute-name element-qualifier*
KEY OF DATA-ATTRIBUTE OF ( attribute-reference )
Each of the above forms returns the "true" name of the attribute. The name is upper-cased when NAMECASE GENERAL YES applies to the SGML document.
When "KEY OF" is applied to an item on the ATTRIBUTES or DATA-ATTRIBUTES shelf, the "@" (ITEM) indexer is required.
Because attributes are always considered to be part of a shelf of attributes, it always makes sense to ask for the "key" of an attribute, even if the ATTRIBUTE form is used rather than ATTRIBUTES.
This is most useful when retrieving the real name of an attribute that is being referenced through an alias. For example:
REPEAT OVER ATTRIBUTES AS this-one DO WHEN ATTRIBUTE this-one ISNT IMPLIED OUTPUT KEY OF ATTRIBUTE this-one OUTPUT "='%v(this-one)'%n" DONE AGAIN
ITEM OF SPECIFIED? ATTRIBUTES element-qualifier* ^ string-expression
ITEM OF SPECIFIED? DATA-ATTRIBUTES (OF ( attribute-reference ))? ^ string-expression
When "ITEM OF" is applied to an item on the ATTRIBUTES or DATA-ATTRIBUTES shelf, the "^" (KEY) indexer is required.
Similarly to asking for the name of an attribute, "ITEM OF" can be used to determine the order in which the attribute was declared. When SPECIFIED , the first declared attribute has item number 1, the second 2, and so on.
For example, the following outputs a line of text only when the IDENT attribute is the first one declared in its ATTLIST:
OUTPUT "IDENT is the first declared attribute%n" WHEN ITEM OF ATTRIBUTES ^ "IDENT" = 1
NUMBER OF SPECIFIED? ATTRIBUTES element-qualifier*
NUMBER OF SPECIFIED? DATA-ATTRIBUTES (OF ( attribute-reference ))?
The number of attributes declared for an element can be determined by applying "NUMBER OF" to ATTRIBUTES, as in:
... SET attribute-count TO NUMBER OF ATTRIBUTES OUTPUT "Element %q has %d(attribute-count) attribute(s).%n" SET attribute-count TO NUMBER OF ATTRIBUTES OF PARENT OUTPUT "Element %q's parent has %d(attribute-count) attribute(s).%n"
The ATTRIBUTES and the DATA-ATTRIBUTES shelves are indexed in the following order:
The shelf contains all of the declared attributes.
The following gives the value of the first attribute declared for the current element no matter what its name or in where its value is specified in a start tag. (The following is in error if there are no attributes declared for the currently opened element or if the first declared attribute has neither a default nor a specified value.)
OUTPUT ATTRIBUTES @ 1
In the next case the value of the first attribute specified in the start tag is output, no matter what its declared order. (In this case it is an error if no attributes are specified in the start tag, even if there are declared attributes and they all have default values.)
OUTPUT SPECIFIED ATTRIBUTES @ 1
The following examples will give different values, because each shelf can contain a different number of attributes, and the "SPECIFIED ATTRIBUTES" shelf can be in a different order.
Example A
LOCAL COUNTER id-pos SET id-pos TO ITEM OF ATTRIBUTES ^ "POS"
Example B
LOCAL COUNTER id-pos SET id-pos TO ITEM OF SPECIFIED ATTRIBUTES ^ "POS"
REPEAT OVER SPECIFIED? ATTRIBUTES element-qualifier* AS alias-name local-declaration* action* AGAIN
The "REPEAT OVER" action can be used to iterate over all the attributes:
For example:
REPEAT OVER SPECIFIED ATTRIBUTES AS this-one OUTPUT KEY OF ATTRIBUTE this-one OUTPUT "='%v(this-one)'%n" AGAIN
The order in which the attributes are repeated over is described in Section 14.4.5.1.5, "The Order of Attributes".
"REPEAT OVER ATTRIBUTES" must specify an alias-name (following AS). This name is used to identify the attribute selected on each iteration within the "REPEAT OVER" action. Any name can be used; this-one in the example above was chosen arbitrarily.
Inside the "REPEAT OVER", use of the keyword ATTRIBUTE followed by the alias-name without any element-qualifiers always refers to the attribute identified by the alias.
If the OmniMark programmer wishes to refer to an attribute of the currently opened element with the same name as the alias-name being used, the element-qualifier "OF ELEMENT" can be used. An element-qualifier always indicates that the attribute belongs to the identified currently opened element.
For example, the following action outputs the value of the attribute named "THIS-ONE" of the currently opened element, even if there is an active attribute alias also using the name "THIS-ONE":
... OUTPUT ATTRIBUTE this-one OF ELEMENT ...
"REPEAT OVER" can also be applied to the DATA-ATTRIBUTES shelf, in which case the syntax is:
Syntax
REPEAT OVER SPECIFIED? DATA-ATTRIBUTES (OF ( attribute-reference ))? element-qualifier* AS alias-name local-declaration* action* AGAIN
Within a "REPEAT OVER DATA-ATTRIBUTES", the attribute being iterated over can be referenced with either the keyword DATA-ATTRIBUTE or ATTRIBUTE, followed by the alias-name.
REPEAT OVER SPECIFIED DATA-ATTRIBUTES OF (ATTRIBUTE ref) AS this-one OUTPUT KEY OF DATA-ATTRIBUTE this-one OUTPUT "='%v(this-one)'%n" AGAIN
USING SPECIFIED? ATTRIBUTES element-qualifier* indexer AS alias-name action
USING SPECIFIED? DATA-ATTRIBUTES (OF ( attribute-reference ))? indexer AS alias-name action
USING can be applied to the ATTRIBUTES or the DATA-ATTRIBUTES shelf in the same way as it is applied to programmer-defined shelves. The indexer is required and has one of the following forms:
Syntax
(ITEM | @) numeric-expression
Syntax
(KEY | ^) string-expression
In the action following the USING prefix, the selected item is referenced by the keyword ATTRIBUTE followed by the alias-name defined in the USING prefix. The keyword DATA-ATTRIBUTE can be used when it is an item of the DATA-ATTRIBUTES shelf that has been selected.
The USING prefix must be used when accessing individual items from a list-valued attribute from the ATTRIBUTES or DATA-ATTRIBUTES shelf.
The following USING prefix and action outputs, one per line, the tokens of one of the currently opened element's parent's attributes. The parent's attribute that is selected is the one whose attribute name is the value of the stream variable name-to-be-used:
GLOBAL STREAM name-to-be-used ... USING ATTRIBUTES OF PARENT ^ name-to-be-used AS named-attribute REPEAT OVER ATTRIBUTE named-attribute OUTPUT ATTRIBUTE named-attribute OUTPUT "%n" AGAIN
In this example, the attribute selected from the parent is assigned the alias-name named-attribute, and that alias is used to refer to the selected attribute within the action prefixed by "USING ATTRIBUTES".
Inside the "REPEAT OVER" action above, named-attribute serves two purposes:
it identifies the attribute to be "repeated over",
it identifies the token of the attribute selected on each iteration.
Attribute aliases can be defined by "REPEAT OVER" and USING even when the name of the attribute is known. The attribute alias serves to simplify attribute identification when more than one opened element has an attribute with the same name. For example, in the following, the attribute alias parent-type means that the parent's type attribute can be easily identified, especially in "%v" formats, even if the currently opened element has an attribute named "type":
USING ATTRIBUTE type OF PARENT AS parent-type DO WHEN ATTRIBUTE type != ATTRIBUTE parent-type OUTPUT "Type attributes differ:%n" _ " current: %v(type)%n" _ " parent's: %v(parent-type)%n" DONE
An attribute alias can be defined on any form of "REPEAT OVER" or USING. Within such a context, the attribute alias-name takes precedence over an element attribute or data attribute with the same name: use of the alias-name refers to the attribute associated with the alias, and not with the attribute whose "real" name is the alias name, if any.
The only place where an attribute alias is required is when "REPEAT OVER" or USING is used with ATTRIBUTES, "SPECIFIED ATTRIBUTES", DATA-ATTRIBUTES or "SPECIFIED DATA-ATTRIBUTES". In these cases an alias is required so that the selected attribute or attributes can be identified within the "REPEAT OVER" or USING.
When the operator "KEY OF" is applied to the selected or iterated item inside a "USING ATTRIBUTES" or "REPEAT OVER ATTRIBUTES", it returns the real name of the attribute, and not the alias name.
In the input processor, the input to the SGML parser is "locked" whenever "REPEAT OVER" or USING is applied to any of:
This means that the available set of attributes does not change while the action prefixed by the USING is being performed.
What happens is that while the USING or "REPEAT OVER" is being performed, any text written to the #SGML stream is "buffered" and not passed to the SGML parser. The SGML parser is quiescent, with the happy consequence that there is no question about what set of attributes is identified by the USING or "REPEAT OVER".
In this regard, these forms are like "REPEAT OVER CURRENT ELEMENTS", which also needs to "lock" the SGML parser input in the same circumstances for the same reasons.
All the tests that can normally be applied to attributes can be used with any identification of an attribute, whether the attribute is identified by the attribute's name, by using an attribute alias name, or by selecting an item of ATTRIBUTES.
The following "REPEAT OVER" action only outputs attributes whose values are defaulted or specified (excluding the #IMPLIED and unspecified ones) and whose values consist entirely of letters. The attributes are output in the order that they are declared. It does so by testing an attribute identified by an attribute alias name:
OUTPUT "<%q" REPEAT OVER ATTRIBUTES AS this-attribute DO WHEN ATTRIBUTE this-attribute ISNT IMPLIED & ATTRIBUTE this-attribute MATCHES (LETTER+ VALUE-END) OUTPUT " " OUTPUT KEY OF ATTRIBUTE this-attribute OUTPUT "=%"%v(this-attribute)%"" DONE AGAIN OUTPUT ">"
The following test succeeds only if the first declared attribute for an element has a value of "FIRST":
OUTPUT "First attribute has a value of %"FIRST%"%n" WHEN ATTRIBUTES @ 1 = "FIRST"
Because all attributes of "SPECIFIED ATTRIBUTES" are, by definition, specified, the "IS SPECIFIED", "IS DEFAULTED" and "IS IMPLIED" tests don't make much sense when applied to it: "IS SPECIFIED" always succeeds, and "IS DEFAULTED" and "IS IMPLIED" always fail.
The APPINFO parameter of an SGML declaration is used to provide processing information to an application. The #APPINFO stream provides access to this information. The #APPINFO stream may appear only in a string expression, either in the format item:
"...%g(#appinfo)..."
or as the name of a stream:
#APPINFO
The #APPINFO stream may not be opened, written to, closed, or discarded. It is not ATTACHED if an SGML declaration was not given, if APPINFO NONE was specified in the SGML Declaration, or in cross-translations. If it is ATTACHED, it is also CLOSED.
Next chapter is Chapter 15, "Processing SGML Errors".
Copyright © OmniMark Technologies Corporation, 1988-1997. All rights reserved.
EUM27, release 2, 1997/04/11.