|
|||||
XML Parsing and UTF-8 Encoding | |||||
Prerequisite Concepts |
Version 4.0.1 of the OmniMark programming language supports UTF-8 encoding as part of the XML parser. To allow characters to be processed in a uniform manner, independently of how they come to the XML parser, OmniMark converts numeric character references (such as "�") and hexadecimal character references (such as "¡") into their corresponding UTF-8 encodings.
Version 4.0 of OmniMark supported XML and UTF-8, but had some problems with character references. These problems have been fixed in version 4.0.1.
Version 4.0.1 fixes the following problems that occurred in version 4.0:
The following translate rule can be used as a method of converting UTF-8 encodings outside the ASCII range back into hexadecimal values:
translate utf8-char => c local counter n set n to utf8-char-number c do when n <= "%16r{7F}" output c else output "&#x%16rud(n);" done
Prerequisite Concepts XML document processing |
---- |