Unicode Information

class jkUnicode.UniInfo(uni: int | None = None)

The main Unicode Info object. It gets its Unicode information from the submodules aglfn, uniCase, uniCat, uniDecomposition, uniName, and uniRangesBits which are generated from the official Unicode data. You can find tools to download and regenerate the data in the tools subfolder.

The Unicode Info object is meant to be instantiated once and then reused to get information about different codepoints. Avoid to instantiate it often, because it is rather expensive.

Initialize the Info object with a None e.g. before a loop and then in the loop assign the actual codepoints that you want information about by setting the unicode instance variable. This will automatically update the other instance variables with the correct information from the Unicode standard.

Parameters:

uni (int) – The codepoint.

property block: str | None

The name of the block for the current codepoint.

property category: str | None

The name of the category for the current codepoint.

property category_short: str | None

The short name of the category for the current codepoint.

property char: str | None

The character for the current codepoint.

property decomposition_mapping: list[int]

The decomposition mapping for the current codepoint.

property glyphname: str | None

The AGLFN glyph name for the current codepoint.

property lc_mapping: int | None

The lowercase mapping for the current codepoint.

property name: str | None

The Unicode name for the current codepoint.

property nice_name: str | None

A more human-readable Unicode name for the current codepoint.

property uc_mapping: int | None

The uppercase mapping for the current codepoint.

property unicode: int | None

The Unicode codepoint. Setting this value will look up and fill the other pieces of information, like category, range, decomposition mapping, and case mapping.

jkUnicode.getUnicodeChar(code: int) str

Return the Unicode character for a Unicode codepoint.

Parameters:

code (int) – The codepoint

jkUnicode.get_expanded_glyph_list(unicodes: list[int], ui: UniInfo | None = None) list[tuple[int, str | None]]

“Expand” or annotate a list of codepoints.

For codepoints that have a case mapping (UC or LC), the target codepoint of the case mapping will be added to the list. AGLFN glyph names are added to the list too, so the returned list contains tuples of (codepoint, glyphname), sorted by the codepoint value.

Parameters:
  • unicodes (list) – A list of codepoints

  • ui (UniInfo) – The UniInfo instance to use. If None, one will be instantiated.