function
|
Library: Unicode (OMUNICODE)
Import : omunicode.xmd |
Returns: the two-letter general category value of the argument character |
export string function general-category of value integer character
Use unicode.general-category to find the Unicode general category property of a
character code point, as defined in Unicode 5.1.0. The following general category values
can be returned by this function:
Lu : Letter, Uppercase
Ll : Letter, Lowercase
Lt : Letter, Titlecase
Lm : Letter, Modifier
Lo : Letter, Other
Mn : Mark, Nonspacing
Mc : Mark, Spacing Combining
Me : Mark, Enclosing
Nd : Number, Decimal Digit
Nl : Number, Letter
No : Number, Other
Pc : Punctuation, Connector
Pd : Punctuation, Dash
Ps : Punctuation, Open
Pe : Punctuation, Close
Pi : Punctuation, Initial quote (may behave like Ps or Pe depending on usage)
Pf : Punctuation, Final quote (may behave like Ps or Pe depending on usage)
Po : Punctuation, Other
Sm : Symbol, Math
Sc : Symbol, Currency
Sk : Symbol, Modifier
So : Symbol, Other
Zs : Separator, Space
Zl : Separator, Line
Zp : Separator, Paragraph
Cc : Other, Control
Cf : Other, Format
Cs : Other, Surrogate
Co : Other, Private Use
Cn : Other, Not Assigned
The following pattern function matches a UTF-8 encoded white space character:
import "omunicode.xmd" prefixed by unicode. import "omutf8.xmd" prefixed by utf8. define switch function unicode-whitespace () as return #current-input matches (utf8.char => character (when unicode.general-category of utf8.code-point of character matches "Z"))
To use unicode.general-category, you must import OMUNICODE into your program
using an import declaration such as:
import "omunicode.xmd" prefixed by unicode.