NAME
Unicode::Util - Unicode-aware versions of built-in Perl functions
VERSION
This document describes Unicode::Util version 0.06.
SYNOPSIS
use Unicode::Util qw( graph_length code_length byte_length );
# grapheme cluster ю́: Cyrillic small letter yu + combining acute accent
my $grapheme = "\x{44E}\x{301}";
say graph_length($grapheme); # 1
say code_length($grapheme); # 2
say byte_length($grapheme, 'UTF-8'); # 4
DESCRIPTION
This module provides Unicode-aware versions of Perl’s built-in string
functions, tailored to work on grapheme clusters as opposed to code
points or bytes.
FUNCTIONS
Functions may each be exported explicitly, or by using the ":all" tag
for everything or the ":length" tag for the length functions.
graph_length($string)
Returns the length of the given string in grapheme clusters. This is
the closest to the number of “characters” that many people would
count on a printed string.
code_length($string)
code_length($string, $normal_form)
Returns the length of the given string in code points. This is
likely the number of “characters” that many programmers and
programming languages would count in a string. If the optional
Unicode normalization form is supplied, the length will be of the
string as if it had been normalized to that form.
Valid normalization forms are "C" or "NFC", "D" or "NFD", "KC" or
"NFKC", and "KD" or "NFKD".
byte_length($string)
byte_length($string, $encoding)
byte_length($string, $encoding, $normal_form)
Returns the length of the given string in bytes, as if it were
encoded using the specified encoding or UTF-8 if no encoding is
supplied. If the optional Unicode normalization form is supplied,
the length will be of the string as if it had been normalized to
that form.
graph_chop($string)
Returns the given string with the last grapheme cluster chopped off.
Does not modify the original value, unlike the built-in "chop".
graph_reverse($string)
Returns the given string value with all grapheme clusters in the
opposite order.
TODO
"graph_substr", "graph_index", "graph_rindex"
SEE ALSO
Unicode::GCString, String::Multibyte, Perl6::Str,
<http://perlcabal.org/syn/S32/Str.html>
AUTHOR
Nick Patch <patch@cpan.org>
COPYRIGHT AND LICENSE
© 2011–2012 Nick Patch
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.