Parsed data: formatting - OmniMark Concept

Prerequisite Concepts

Related Syntax

Parsed data: formatting

When you parse XML or SGML data, you have two choices for formatting the parsed data. The first is to capture the parsed data in OmniMark streams and then use regular OmniMark programming techniques to format the data. The other is to give instructions to the parser on how you want the data formatted. The latter approach is often faster and cleaner.

You give instructions to the parser on how to format data using format items and placing the appropriate format modifiers on the format commands "c", "v", and "q". In some cases, you can use format modifiers to access different elements or interpretations of a piece of parsed data.

Formatting data content

Unless you intercept it in a data content rule, the parser outputs data content to the current output scope. You can add format modifiers to the parse continuation operator, "%c", to change how the data content is formatted.

The supported format modifiers are:

"h" -- prevents line-breaking rules like insertion-break and replacement-break from applying to the content of the current component.
"l" -- converts all text to lowercase. It only applies to letters in the processed document (data characters in content and attribute values) copied from the input to the output. It does not apply to letters in quoted strings in the OmniMark program.
"u" -- converts all of the text to uppercase. It only applies to letters in the processed document (data characters in content and attribute values) copied from the input to the output. It does not apply to letters in quoted strings in the OmniMark program.
"s" -- causes white space to be stripped in the processed content as follows:
1. Leading and trailing spaces and line-ends are removed from components.
2. Sequences of tabs and spaces are condensed to a single space.
3. Sequences of line-ends, together with any intervening, leading, or trailing tabs and spaces, are condensed to a single line-end.
The "s" modifier affects only text received directly from the SGML parser, or from characters specified with format items that explicitly allow stripping.
"z" -- turns off translate rules that would otherwise apply to all or part of the content.

It is possible to override the subcomponents, even those going into the same stream, by removing the modifier with the following syntax:

  put my-stream with "" "%c"

Formatting element names and external entities

The "%q" format item refers to the name of the currently opened element everywhere except in external-text-entity and external-data-entity rules. In functions, even if the function is called from an external-text-entity or an external-data-entity rule, the "%q" still refers to the currently opened element. This is to ensure that a function always behaves in the same way, regardless of what rule it is called from.

When referring to an element, the "%q" format can have the following modifiers:

"l": converts all text to lowercase. Cannot be used with the "u" modifier.
"u": converts all text to uppercase. Cannot be used with the "l" modifier.
number "f": The field width modifier, "f" is allowed with the "%q" format. If the specified number is less than the minimum number of characters needed to format the value, the modifier is ignored. If it is greater, space characters are added to the right of the value to fill it out to the field width.
"k": the right-justification modifier. It is allowed when the field-width modifier is given. It causes padding on the left side of the field instead of the right. The "k" modifier requires the "f" modifier.

In entity rules, the "%q" format item only refers to the name of the current entity in the actions of an external-data-entity or an external-text-entity rule. It refers to the current element everywhere else, including in functions, as mentioned above.

The following modifiers can be used to return other information about a current entity.

"e": causes OmniMark to access the system identifier from the entity declaration instead of the entity name.
"o": causes OmniMark to access the notation name from the entity declaration instead of the entity name. This modifier can only be used in external-data-entity rules because external text entities do not have a notation. This is the only format modifier that can be combined with the "f" or "k" format modifiers described above.
"p": causes OmniMark to access the public identifier from the entity declaration instead of the entity name.

These modifiers can be combined as follows:

"ep": causes OmniMark to access the system identifier obtained by searching for the entity's public identifier in the library rules.
"eo": causes OmniMark to access the system identifier declared for the notation associated with the entity. This combination can only be used in an external-data-entity rule because external text entities have no notations.
"eop": causes OmniMark to access the system identifier obtained by searching for the entity notation's public identifier in library rules. This combination can only be used in an external-data-entity rule because external text entities have no notations.

If an entity has no system identifier, then the "e" format modifier acts the way "ep" does.

If an entity has no public identifier, or if the program has no library rule to associate a system identifier with the entity's public identifier, then it is an error to use the "ep" format modifier combination. If such an entity also does not declare a system identifier in the entity declaration, then it is also an error to use the "e" format modifier.

The same observation applies to the system identifier of the entity's notation when using the above format modifiers in combination with the "o" format modifier.

All of the combinations above may be further combined with the "l" or "u" format modifiers. Additionally, the "o" format modifier can also be combined with the "f" and "k" format modifiers, provided that it is not also combined with the "e" or "p" modifiers.

The "f" and "k" format modifiers can only be used with entity names and notation names.

Formatting attributes and external data entities

You use the "%v" format item to output an attribute of an element or of an external-data-entity.

The following example outputs the section ID (the attribute named "ID") when processing an SGML document:

  element section
    output "Section: || %v(id) || %c"

The DTD for the above example must contain lines similar to the following:

  <!element section - o (#PCDATA)>
  <!attlist section id number #REQUIRED>

In element rules, the named attribute must be an attribute of the element; in external-data-entity rules, it must be a data attribute of the entity being processed. In all other rules, the named attribute must be an attribute of the containing element.

The following modifiers can always be used with the "%v" format item:

"l": forces the letters in the attribute value to lowercase.
"u": forces the letters in the attribute value to uppercase.
"number f": the field-width modifier is allowed in the "%v" format (although it is ignored for CDATA attributes). If the number is less than the minimum number of characters needed to print the attribute value, it is ignored. If it is greater, space characters are added to the right of the value to fill out the field width.
"k": allowed only with "f", this puts the spaces to the left of the value instead of to the right.

If the attribute has a CDATA declared type, the following modifiers can also be used:

"h": prevents the insertion of line breaks.
"s": minimizes white spaces.
1. condenses sequences of white space not containing a line-end to a single space.
2. condenses sequences of white space containing a line-end into a single line-end.
"z": prevents selection of any translate rules that would otherwise apply to all or part of the attribute value.

If the attribute's declared type is entity or entities, and the entity name refers to an external entity, you can use the following modifiers (but not with the "f", "k", "l", and "u" modifiers):

"e": causes OmniMark to access the system identifier from the entity declaration instead of accessing the entity name.
"o": causes OmniMark to access the notation name from the entity declaration instead of from the entity name (exception: you can use the "f", "k", "l", and "u" modifiers in this combination).
"p": causes OmniMark to access the public identifier from the entity declaration instead of accessing the entity name.

These modifiers are combined as follows:

"%epv": causes OmniMark to access the system identifier found by searching for the entity's public identifier in the library rules.
"%eov": causes OmniMark to access the system identifier declared for the notation associated with the entity.
"%pov": causes OmniMark to access the public identifier declared for the notation associated with the entity.
"%epov": causes OmniMark to access the system identifier found by searching for the entity notation's public identifier in the library rules.

If an entity has no system identifier, then "e" acts as "ep" does. It is an error if either "e" or "ep" is used, and the entity has no system or public identifier bound by a library rule to a system identifier.

This format accesses letters within system and public identifiers in uppercase or lowercase as they appear in the entity declaration. Letters in element, entity, or notation names appear in uppercase or lowercase as they appear in the processed document, unless the SGML declaration specifies uppercase substitution for that class of name. If so, the name is accessed with letters forced to uppercase. Thus, in the Reference Concrete Syntax, by default, element and notation names appear in uppercase while entity names appear as entered in the document.

For an entities attribute, if the attribute value contains more than one entity name, the using prefix must be used to select one entity whose system or public identifier is to be manipulated or displayed.

If the value of an entity or entities attribute is the name of an internal CDATA or SDATA entity, the "%ev" format can be used to determine the replacement text of the internal entity.

The "e", "p", and "ep" formats can also be used with notation attributes, under the same conditions as entity or entities.

This example illustrates how the "%ev" format handles internal and external entities differently.

The element "as-is" has a single required ENTITY attribute "text". The entity named by the attribute value simply provides the text that is to replace the element, wherever it occurs in a document.

  <!ELEMENT as-is - o EMPTY>
  <!ELEMENT as-is text ENTITY #REQUIRED>

The element rule for processing the "as-is" element does the following:

If the entity named by attribute "text" is an external entity, then the element rule uses the system identifier declared for the entity, or the system identifier associated by a library rule with the entity's declared public identifier, as a filename, and replaces the element with the contents of the file. (OmniMark reports an error if the entity does not have a system identifier and does not have a public identifier mapped to a system identifier, or if the system identifier names a nonexistent file. This may or may not be appropriate for a particular OmniMark program.)
If the entity is an internal entity, then the element rule uses the replacement text of the internal entity and replaces the element with that text.

Note that "%ev" returns one of two things, depending on whether the entity named by the attribute to which it is applied is internal or external:

For an attribute token that is the name of an external entity, it returns the system identifier.
For an attribute token that is the name of an internal entity, it returns that entity's replacement text.

  element as-is
        do when attribute text is external
           output file "%ev(text)"
        else
           output "%ev(text)"
        done
     suppress

Prerequisite Concepts
   Format items
   Markup rules
Related Syntax
   attribute
   external-data-entity
   external-text-entity

----

[ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ OMX ] [ ERRORS ]

OmniMark 6.5 Documentation Generated: December 23, 2002 at 6:24:56 pm
If you have any comments about this section of the documentation, send email to [email protected]

Copyright © OmniMark Technologies Corporation, 1988-2002.