uniname.pl -- Unicode character names

This library relates Unicode code points to their formal Unicode character names (the Name property of UnicodeData.txt). It ships its own compact UCD-derived table (about 360 KB) and is independent of library(unicode) and library(unicode_security).

Algorithmic name ranges (Hangul syllables, CJK and Tangut ideographs and the various PREFIX-<hex> families) are synthesised from the code point and carry no per-code-point storage; the remaining ~34,600 names are stored as a shared word table plus a packed token stream. See etc/gen_uniname.pl in the package directory to regenerate the table on a Unicode-version bump.

unicode_name(?CodePoint:integer, ?Name:atom) is nondet

True when Name is the Unicode character name of CodePoint. Usage:

unicode_name(+CodePoint, -Name) is semidet: the name of CodePoint, failing when it has none (control, surrogate, private-use or unassigned code points).
unicode_name(-CodePoint, +Name) is semidet: the (unique) code point with the given name.
unicode_name(-CodePoint, -Name) is nondet: enumerate every named code point on backtracking.

Name is an atom of the formal Unicode name in upper case, e.g.

?- unicode_name(0'A, N).
N = 'LATIN CAPITAL LETTER A'.

?- unicode_name(C, 'EURO SIGN').
C = 8364.

?- unicode_name(0xAC00, N).
N = 'HANGUL SYLLABLE GA'.