ISO/IEC 8859 (OMFF8859)

The OMFF8859 library provides functions for converting an ISO/IEC 8859 encoded stream to one encoded using UTF-8, and back. The ISO/IEC 8859 series of standards specifies how to extend ASCII to high-bit characters, to allow up to 256 code points to be encoded. Of the fifteen specified encodings, OMFF8859 provides support for six:

  • ISO/IEC 8859-1: Latin-1 Western European,
  • ISO/IEC 8859-5: Latin/Cyrillic,
  • ISO/IEC 8859-6: Latin/Arabic,
  • ISO/IEC 8859-7: Latin/Greek,
  • ISO/IEC 8859-8: Latin/Hebrew, and
  • ISO/IEC 8859-11: Latin/Thai.

The following example takes an ISO/IEC 8859-1 (Latin-1) encoded stream and converts it to UTF-8 before streaming it to the XML parser for further processing. It then converts the results back to ISO/IEC 8859-1 for output.

  import "omff8859.xmd" prefixed by iso8859.
  
  
  process
     using output as iso8859.writer in iso8859.encoding-8859-1 into #main-output
     do xml-parse scan iso8859.reader in iso8859.encoding-8859-1 from #main-input
        output "%c"
     done
  
  ; ...
          

As the example demonstrates, the conversions performed by OMFF8859 are configured by specifying a constant to the iso8859.reader and iso8859.writer functions. One constant is defined for each of the encodings supported:

  • encoding-8859-1: ISO/IEC 8859-1,
  • encoding-8859-5: ISO/IEC 8859-5,
  • encoding-8859-6: ISO/IEC 8859-6,
  • encoding-8859-7: ISO/IEC 8859-7,
  • encoding-8859-8: ISO/IEC 8859-8, and
  • encoding-8859-11: ISO/IEC 8859-11.