|
|||||
Base character sets: defining | |||||
Prerequisite Concepts |
In the notation used in ISO 8879, the syntax of an external character set description is:
external character set description = ps+, (external character description, ps+)* external character description = external character number, ps+, (number of characters, ps+)?, (graphic character assignment | external character assignment) graphic character assignment = (lit, graphic character*, lit) | (lita, graphic character*, lita) | "TAB" | "B" external character assignment = "UCLETTER" | ("LCLETTER", ps+, external character number) | ("DIGIT", ps+, digit value) | "SPECIAL" | "DATA" | "CONTROL" external character number = number digit value = number
In this syntax:
A graphic character assignment indicates how characters in parameter literals in the concrete syntax (delimiter strings and the "LCNMSTRT", "UCNMSTRT", "LCNMCHAR" and "UCNMCHAR" strings) are to be interpreted, as in:
65 "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
This example indicates that the characters in the literal, when encountered in a parameter literal in the concrete syntax, are to be interpreted as characters with numbers 65 through 80 inclusive, in the character set identified by the public identifier or in the entity containing the external character set description.
The character with a numeric value of zero (\0, � or CTRL-@) should not be used in a "graphic character assignment". If it is used, it is treated as if it did not appear in the string (it's ignored). The "zero digit" character (0) can be used, as it is not the same as the zero value character.
An external character assignment assigns characters in the base character set, starting with the "external character number" and continuing for "number of characters" to one of the following categories:
All characters (in the range of allowed values, 0 to 255 in current versions of OmniMark) that are not placed in one of these categories are classified as non-significant, non-control data characters. (Note that the method for defining base character sets ensures that no character will ever be two or more of "LCLETTER", "UCLETTER", "DIGIT", "SPECIAL", or "CONTROL".)
The text shown below defines the "ISO 646 (IRV)" character set, which can either be kept in a file or hard coded into an OmniMark program. This text can be provided to the SGML parser when it encounters the "#charset" entity in the SGML Declaration.
9 tab 32 ' !"#$%&' 39 "'()*+,-./0123456789:;<=>?" 64 "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_" 96 "`abcdefghijklmnopqrstuvwxyz{|}~" 0 32 control 39 3 special 43 5 special 48 10 digit 0 58 special 61 special 63 special 65 26 ucletter 66 b 97 26 lcletter 65 127 control
Prerequisite Concepts Base character sets |
---- |