Base character sets: defining

Base character sets: defining

Prerequisite Concepts

In the notation used in ISO 8879, the syntax of an external character set description is:

  external character set description =
     ps+, (external character description, ps+)*
  external character description =
     external character number, ps+, (number of characters, ps+)?,
     (graphic character assignment | external character assignment)
  graphic character assignment =
     (lit, graphic character*, lit) |
     (lita, graphic character*, lita) |
     "TAB" |
     "B"
  external character assignment =
     "UCLETTER" | ("LCLETTER", ps+, external character number) |
     ("DIGIT", ps+, digit value) | "SPECIAL" | "DATA" | "CONTROL"
  external character number = number
  digit value = number

In this syntax:

"ps+" can be any white space or comment, as in the SGML Declaration.
"number of characters" and "number" are in ISO 8879.
"graphic character" is any character other than either a control character or the delimiter that terminates the literal containing it.
reserved names can be entered using any combination of uppercase and equivalent lowercase letters, as in the SGML Declaration.
"TAB" indicates that the external character number is that of the "tab" character.
"B" indicates that the external character number is that of the "uppercase letter B", used in B sequences in short reference delimiters. (Unless a "B" is specified, the character number associated with the graphic character "B" is used.)
a "number of characters", if specified in conjunction with a graphic character assignment, is ignored unless its value is zero, in which case the assignment is ignored instead.
"RE", "RS", and "SPACE" characters are interpreted as specified in the concrete syntax.

A graphic character assignment indicates how characters in parameter literals in the concrete syntax (delimiter strings and the "LCNMSTRT", "UCNMSTRT", "LCNMCHAR" and "UCNMCHAR" strings) are to be interpreted, as in:

  65   "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

This example indicates that the characters in the literal, when encountered in a parameter literal in the concrete syntax, are to be interpreted as characters with numbers 65 through 80 inclusive, in the character set identified by the public identifier or in the entity containing the external character set description.

The character with a numeric value of zero (\0,  or CTRL-@) should not be used in a "graphic character assignment". If it is used, it is treated as if it did not appear in the string (it's ignored). The "zero digit" character (0) can be used, as it is not the same as the zero value character.

An external character assignment assigns characters in the base character set, starting with the "external character number" and continuing for "number of characters" to one of the following categories:

"UCLETTER", an uppercase letter.
"LCLETTER", a lowercase letter (the "external character number" following the keyword is the number of the corresponding uppercase letter).
"DIGIT".
"SPECIAL".
"CONTROL", a "control" character (as defined for the base character set).

All characters (in the range of allowed values, 0 to 255 in current versions of OmniMark) that are not placed in one of these categories are classified as non-significant, non-control data characters. (Note that the method for defining base character sets ensures that no character will ever be two or more of "LCLETTER", "UCLETTER", "DIGIT", "SPECIAL", or "CONTROL".)

The text shown below defines the "ISO 646 (IRV)" character set, which can either be kept in a file or hard coded into an OmniMark program. This text can be provided to the SGML parser when it encounters the "#charset" entity in the SGML Declaration.

        9        tab
       32       ' !"#$%&'
       39       "'()*+,-./0123456789:;<=>?"
       64       "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_"
       96       "`abcdefghijklmnopqrstuvwxyz{|}~"
        0  32   control
       39   3   special
       43   5   special
       48  10  digit     0
       58        special
       61        special
       63        special
       65  26  ucletter
       66        b
       97  26  lcletter 65
      127       control

Prerequisite Concepts
Base character sets

----

[CONTENTS] [CONCEPTS] [SYNTAX] [LIBRARIES] [SAMPLES] [ERRORS] [INDEX]