std::unicode

Status: shipped

Unicode general-category predicates, casing, normalization, and segmentation.

Public items

Name Kind Description
is_letter fn True if r is in general-category group L.
is_digit fn True if r is a decimal digit (category Nd).
is_number fn True if r is any numeric (Nd|Nl|No).
is_space fn True if r is whitespace (Z* plus HT/LF/VT/FF/CR/NEL).
is_upper fn True if r is category Lu.
is_lower fn True if r is category Ll.
is_title fn True if r is category Lt.
is_punct fn True if r is in general-category group P.
is_symbol fn True if r is in general-category group S.
is_mark fn True if r is in general-category group M.
is_print fn True if r is printable (not Cc/Cf/Cs/Co/Cn).
is_graphic fn True if r is graphic (printable and not whitespace).
is_control fn True if r is category Cc.
is_assigned fn True if r is an assigned code point (not Cn).
to_upper fn Simple uppercase mapping for one rune.
to_lower fn Simple lowercase mapping for one rune.
to_title fn Simple titlecase mapping for one rune.
simple_fold fn Next rune in Unicode case-folding cycle.
combining_class fn Canonical combining class (0-254) for r.
to_upper_str fn Full uppercase mapping for a string (ss -> SS).
to_lower_str fn Full lowercase mapping for a string.
fold_case fn Simple case-folded comparison form for a string.
nfc fn Normalize a string to NFC (canonical composition).
nfd fn Normalize a string to NFD (canonical decomposition).
nfkc fn Normalize a string to NFKC (compat composition).
nfkd fn Normalize a string to NFKD (compat decomposition).
is_nfc fn True if a string is already in NFC.
is_nfd fn True if a string is already in NFD.
is_nfkc fn True if a string is already in NFKC.
is_nfkd fn True if a string is already in NFKD.
graphemes fn UAX #29 extended grapheme clusters of a string.
grapheme_count fn Number of UAX #29 grapheme clusters in a string.
words fn UAX #29 Unicode words in a string (skips punct/whitespace).
word_bounds fn UAX #29 word boundaries (includes punct + whitespace runs).
word_count fn Number of UAX #29 words in a string.
sentences fn UAX #29 Unicode sentences in a string.
sentence_count fn Number of UAX #29 sentences in a string.