contentsconceptssyntaxlibrariessampleserrorsindex
Full text search
Base character sets: defining
Prerequisite Concepts      

In the notation used in ISO 8879, the syntax of an external character set description is:

  external character set description =
     ps+, (external character description, ps+)*
  external character description =
     external character number, ps+, (number of characters, ps+)?,
     (graphic character assignment | external character assignment)
  graphic character assignment =
     (lit, graphic character*, lit) |
     (lita, graphic character*, lita) |
     "TAB" |
     "B"
  external character assignment =
     "UCLETTER" | ("LCLETTER", ps+, external character number) |
     ("DIGIT", ps+, digit value) | "SPECIAL" | "DATA" | "CONTROL"
  external character number = number
  digit value = number

In this syntax:

A graphic character assignment indicates how characters in parameter literals in the concrete syntax (delimiter strings and the "LCNMSTRT", "UCNMSTRT", "LCNMCHAR" and "UCNMCHAR" strings) are to be interpreted, as in:

  65   "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

This example indicates that the characters in the literal, when encountered in a parameter literal in the concrete syntax, are to be interpreted as characters with numbers 65 through 80 inclusive, in the character set identified by the public identifier or in the entity containing the external character set description.

The character with a numeric value of zero (\0, � or CTRL-@) should not be used in a "graphic character assignment". If it is used, it is treated as if it did not appear in the string (it's ignored). The "zero digit" character (0) can be used, as it is not the same as the zero value character.

An external character assignment assigns characters in the base character set, starting with the "external character number" and continuing for "number of characters" to one of the following categories:

All characters (in the range of allowed values, 0 to 255 in current versions of OmniMark) that are not placed in one of these categories are classified as non-significant, non-control data characters. (Note that the method for defining base character sets ensures that no character will ever be two or more of "LCLETTER", "UCLETTER", "DIGIT", "SPECIAL", or "CONTROL".)

The text shown below defines the "ISO 646 (IRV)" character set, which can either be kept in a file or hard coded into an OmniMark program. This text can be provided to the SGML parser when it encounters the "#charset" entity in the SGML Declaration.

        9        tab
       32       ' !"#$%&'
       39       "'()*+,-./0123456789:;<=>?"
       64       "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_"
       96       "`abcdefghijklmnopqrstuvwxyz{|}~"
        0  32   control
       39   3   special
       43   5   special
       48  10  digit     0
       58        special
       61        special
       63        special
       65  26  ucletter
       66        b
       97  26  lcletter 65
      127       control

Prerequisite Concepts
     Base character sets
 
   
----

Top [CONTENTS] [CONCEPTS] [SYNTAX] [LIBRARIES] [SAMPLES] [ERRORS] [INDEX]

Generated: April 21, 1999 at 2:00:46 pm
If you have any comments about this section of the documentation, send email to [email protected]

Copyright © OmniMark Technologies Corporation, 1988-1999.