The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.
NAME
    Unicode::Util - Unicode-aware versions of built-in Perl functions

VERSION
    This document describes Unicode::Util version 0.06.

SYNOPSIS
        use Unicode::Util qw( graph_length code_length byte_length );

        # grapheme cluster ю́: Cyrillic small letter yu + combining acute accent
        my $grapheme = "\x{44E}\x{301}";

        say graph_length($grapheme);          # 1
        say code_length($grapheme);           # 2
        say byte_length($grapheme, 'UTF-8');  # 4

DESCRIPTION
    This module provides Unicode-aware versions of Perl’s built-in string
    functions, tailored to work on grapheme clusters as opposed to code
    points or bytes.

FUNCTIONS
    Functions may each be exported explicitly, or by using the ":all" tag
    for everything or the ":length" tag for the length functions.

    graph_length($string)
        Returns the length of the given string in grapheme clusters. This is
        the closest to the number of “characters” that many people would
        count on a printed string.

    code_length($string)
    code_length($string, $normal_form)
        Returns the length of the given string in code points. This is
        likely the number of “characters” that many programmers and
        programming languages would count in a string. If the optional
        Unicode normalization form is supplied, the length will be of the
        string as if it had been normalized to that form.

        Valid normalization forms are "C" or "NFC", "D" or "NFD", "KC" or
        "NFKC", and "KD" or "NFKD".

    byte_length($string)
    byte_length($string, $encoding)
    byte_length($string, $encoding, $normal_form)
        Returns the length of the given string in bytes, as if it were
        encoded using the specified encoding or UTF-8 if no encoding is
        supplied. If the optional Unicode normalization form is supplied,
        the length will be of the string as if it had been normalized to
        that form.

    graph_chop($string)
        Returns the given string with the last grapheme cluster chopped off.
        Does not modify the original value, unlike the built-in "chop".

    graph_reverse($string)
        Returns the given string value with all grapheme clusters in the
        opposite order.

TODO
    "graph_substr", "graph_index", "graph_rindex"

SEE ALSO
    Unicode::GCString, String::Multibyte, Perl6::Str,
    <http://perlcabal.org/syn/S32/Str.html>

AUTHOR
    Nick Patch <patch@cpan.org>

COPYRIGHT AND LICENSE
    © 2011–2012 Nick Patch

    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.