<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" 
               "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
<!ENTITY version SYSTEM "version.xml">
]>
<refentry id="redland-unicode">
<refmeta>
<refentrytitle role="top_of_page">Unicode</refentrytitle>
<manvolnum>3</manvolnum>
<refmiscinfo>REDLAND Library</refmiscinfo>
</refmeta>

<refnamediv>
<refname>Unicode</refname>
<refpurpose>Unicode utility functions.</refpurpose>
<!--[<xref linkend="desc" endterm="desc.title"/>]-->
</refnamediv>

<refsynopsisdiv role="synopsis">
<title role="synopsis.title">Synopsis</title>

<synopsis>



typedef     <link linkend="librdf-unichar">librdf_unichar</link>;
<link linkend="int">int</link>         <link linkend="librdf-unicode-char-to-utf8">librdf_unicode_char_to_utf8</link>     (<link linkend="librdf-unichar">librdf_unichar</link> c,
                                             <link linkend="byte">byte</link> *output,
                                             <link linkend="int">int</link> length);
<link linkend="int">int</link>         <link linkend="librdf-utf8-to-unicode-char">librdf_utf8_to_unicode_char</link>     (<link linkend="librdf-unichar">librdf_unichar</link> *output,
                                             const <link linkend="byte">byte</link> *input,
                                             <link linkend="int">int</link> length);
<link linkend="byte">byte</link>*       <link linkend="librdf-utf8-to-latin1">librdf_utf8_to_latin1</link>           (const <link linkend="byte">byte</link> *input,
                                             <link linkend="int">int</link> length,
                                             <link linkend="int">int</link> *output_length);
<link linkend="byte">byte</link>*       <link linkend="librdf-latin1-to-utf8">librdf_latin1_to_utf8</link>           (const <link linkend="byte">byte</link> *input,
                                             <link linkend="int">int</link> length,
                                             <link linkend="int">int</link> *output_length);
<link linkend="void">void</link>        <link linkend="librdf-utf8-print">librdf_utf8_print</link>               (const <link linkend="byte">byte</link> *input,
                                             <link linkend="int">int</link> length,
                                             <link linkend="FILE:CAPS">FILE</link> *stream);
</synopsis>
</refsynopsisdiv>









<refsect1 role="desc">
<title role="desc.title">Description</title>
<para>
Utility functions to convert between UTF-8, full Unicode and Latin-1.
Redland uses UTF-8 for all string formats (except where noted) but
these may need to be converted to other Unicode encodings or downgraded
with loss to Latin-1.
</para>
</refsect1>

<refsect1 role="details">
<title role="details.title">Details</title>
<refsect2>
<title><anchor id="librdf-unichar" role="typedef"/>librdf_unichar</title>
<indexterm><primary>librdf_unichar</primary></indexterm><programlisting>typedef u32 librdf_unichar;
</programlisting>
<para>
Unicode codepoint.</para>
<para>

</para></refsect2>
<refsect2>
<title><anchor id="librdf-unicode-char-to-utf8" role="function"/>librdf_unicode_char_to_utf8 ()</title>
<indexterm><primary>librdf_unicode_char_to_utf8</primary></indexterm><programlisting><link linkend="int">int</link>         librdf_unicode_char_to_utf8     (<link linkend="librdf-unichar">librdf_unichar</link> c,
                                             <link linkend="byte">byte</link> *output,
                                             <link linkend="int">int</link> length);</programlisting>
<para>
Convert a Unicode character to UTF-8 encoding.
</para>
<para>
If buffer is NULL, then will calculate the length rather than
perform it.  This can be used by the caller to allocate space
and then re-call this function with the new buffer.</para>
<para>

</para><variablelist role="params">
<varlistentry><term><parameter>c</parameter>&nbsp;:</term>
<listitem><simpara> Unicode character
</simpara></listitem></varlistentry>
<varlistentry><term><parameter>output</parameter>&nbsp;:</term>
<listitem><simpara> UTF-8 string buffer or NULL
</simpara></listitem></varlistentry>
<varlistentry><term><parameter>length</parameter>&nbsp;:</term>
<listitem><simpara> buffer size
</simpara></listitem></varlistentry>
<varlistentry><term><emphasis>Returns</emphasis>&nbsp;:</term><listitem><simpara> bytes written to output buffer or &lt;0 on failure
</simpara></listitem></varlistentry>
</variablelist></refsect2>
<refsect2>
<title><anchor id="librdf-utf8-to-unicode-char" role="function"/>librdf_utf8_to_unicode_char ()</title>
<indexterm><primary>librdf_utf8_to_unicode_char</primary></indexterm><programlisting><link linkend="int">int</link>         librdf_utf8_to_unicode_char     (<link linkend="librdf-unichar">librdf_unichar</link> *output,
                                             const <link linkend="byte">byte</link> *input,
                                             <link linkend="int">int</link> length);</programlisting>
<para>
Convert an UTF-8 encoded buffer to a Unicode character.
</para>
<para>
If output is NULL, then will calculate the number of bytes that
will be used from the input buffer and not perform the conversion.</para>
<para>

</para><variablelist role="params">
<varlistentry><term><parameter>output</parameter>&nbsp;:</term>
<listitem><simpara> Pointer to the Unicode character or NULL
</simpara></listitem></varlistentry>
<varlistentry><term><parameter>input</parameter>&nbsp;:</term>
<listitem><simpara> UTF-8 string buffer
</simpara></listitem></varlistentry>
<varlistentry><term><parameter>length</parameter>&nbsp;:</term>
<listitem><simpara> buffer size
</simpara></listitem></varlistentry>
<varlistentry><term><emphasis>Returns</emphasis>&nbsp;:</term><listitem><simpara> bytes used from input buffer or &lt;0 on failure
</simpara></listitem></varlistentry>
</variablelist></refsect2>
<refsect2>
<title><anchor id="librdf-utf8-to-latin1" role="function"/>librdf_utf8_to_latin1 ()</title>
<indexterm><primary>librdf_utf8_to_latin1</primary></indexterm><programlisting><link linkend="byte">byte</link>*       librdf_utf8_to_latin1           (const <link linkend="byte">byte</link> *input,
                                             <link linkend="int">int</link> length,
                                             <link linkend="int">int</link> *output_length);</programlisting>
<para>
Convert a UTF-8 string to ISO Latin-1.
</para>
<para>
Converts the given UTF-8 string to the ISO Latin-1 subset of
Unicode (characters 0x00-0xff), discarding any out of range
characters.
</para>
<para>
If the output_length pointer is not NULL, the returned string
length will be stored there.</para>
<para>

</para><variablelist role="params">
<varlistentry><term><parameter>input</parameter>&nbsp;:</term>
<listitem><simpara> UTF-8 string buffer
</simpara></listitem></varlistentry>
<varlistentry><term><parameter>length</parameter>&nbsp;:</term>
<listitem><simpara> buffer size
</simpara></listitem></varlistentry>
<varlistentry><term><parameter>output_length</parameter>&nbsp;:</term>
<listitem><simpara> Pointer to variable to store resulting string length or NULL
</simpara></listitem></varlistentry>
<varlistentry><term><emphasis>Returns</emphasis>&nbsp;:</term><listitem><simpara> pointer to new ISO Latin-1 string or NULL on failure
</simpara></listitem></varlistentry>
</variablelist></refsect2>
<refsect2>
<title><anchor id="librdf-latin1-to-utf8" role="function"/>librdf_latin1_to_utf8 ()</title>
<indexterm><primary>librdf_latin1_to_utf8</primary></indexterm><programlisting><link linkend="byte">byte</link>*       librdf_latin1_to_utf8           (const <link linkend="byte">byte</link> *input,
                                             <link linkend="int">int</link> length,
                                             <link linkend="int">int</link> *output_length);</programlisting>
<para>
Convert an ISO Latin-1 encoded string to UTF-8.
</para>
<para>
Converts the given ISO Latin-1 string to an UTF-8 encoded string
representing the same content.  This is lossless.
</para>
<para>
If the output_length pointer is not NULL, the returned string
length will be stored there.</para>
<para>

</para><variablelist role="params">
<varlistentry><term><parameter>input</parameter>&nbsp;:</term>
<listitem><simpara> ISO Latin-1 string buffer
</simpara></listitem></varlistentry>
<varlistentry><term><parameter>length</parameter>&nbsp;:</term>
<listitem><simpara> buffer size
</simpara></listitem></varlistentry>
<varlistentry><term><parameter>output_length</parameter>&nbsp;:</term>
<listitem><simpara> Pointer to variable to store resulting string length or NULL
</simpara></listitem></varlistentry>
<varlistentry><term><emphasis>Returns</emphasis>&nbsp;:</term><listitem><simpara> pointer to new UTF-8 string or NULL on failure
</simpara></listitem></varlistentry>
</variablelist></refsect2>
<refsect2>
<title><anchor id="librdf-utf8-print" role="function"/>librdf_utf8_print ()</title>
<indexterm><primary>librdf_utf8_print</primary></indexterm><programlisting><link linkend="void">void</link>        librdf_utf8_print               (const <link linkend="byte">byte</link> *input,
                                             <link linkend="int">int</link> length,
                                             <link linkend="FILE:CAPS">FILE</link> *stream);</programlisting>
<para>
Print a UTF-8 string to a stream.
</para>
<para>
Pretty prints the UTF-8 string in a pseudo-C character
format like \u<emphasis>hex digits</emphasis> when the characters fail
the <link linkend="isprint"><function>isprint()</function></link> test.</para>
<para>

</para><variablelist role="params">
<varlistentry><term><parameter>input</parameter>&nbsp;:</term>
<listitem><simpara> UTF-8 string buffer
</simpara></listitem></varlistentry>
<varlistentry><term><parameter>length</parameter>&nbsp;:</term>
<listitem><simpara> buffer size
</simpara></listitem></varlistentry>
<varlistentry><term><parameter>stream</parameter>&nbsp;:</term>
<listitem><simpara> FILE* stream
</simpara></listitem></varlistentry>
</variablelist></refsect2>

</refsect1>




</refentry>
