$Id: README.af 13987 2010-05-14 16:03:36Z mjevans $ =for humans This file is written in the POD format, use pod2html or similar to view it. =head1 Unicode patch for DBD::ODBC 1.13 Date: 2006-03-04 =head2 Goals =over 4 =item * Allow an application on Win32 using DBD::ODBC to store and fetch Unicode strings in a MS SQL Server 2000 database. =item * Minimal changes to the existing driver code. =item * Don't break working code. =item * Make the DBD::ODBC "do the right thing" with the UTF8 flag. =item * A first step towards proper Unicode support for DBD:ODBC. =back =head2 Legal disclaimer The Unicode patch for DBD::ODBC was initially written by Alexander Foken (alexander at foken dot de). I wrote this patch during my working hours for a project that needs Unicode support in DBD::ODBC on Windows, and I have the permission of my former employer to publish this patch. You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file. =head2 Applying the Patch =over 4 =item 1. get DBD::ODBC 1.13 from CPAN =item 2. unpack it =item 3. C DBD-ODBC-1.13-Unicode.patch> =item 4. build DBD::ODBC as usual (C, C, C, C) =back =head2 Tested Databases =over 4 =item Microsoft SQL Server 2000 SP3 Works, using the ODBC driver from the MS SQL 2000 CDROM. You need to install SP1, SP2, and SP3 on both client and server, because DBD::ODBC needs a recent set of MDAC libraries. =item Oracle 9.2 Works, using the ODBC driver from the Oracle 9.2 CDROM. You have to set the environment variables C and C (or any other language setting ending with "C<.AL32UTF8>") before loading the patched DBD::ODBC to make Oracle return Unicode data. (See also "Oracle and Unicode" in the POD of DBD::Oracle.) And you have to enable the "C" Workaround in the Oracle ODBC Driver Configuration to make Oracle return Unicode to a non-Unicode application. Alternatively, you can include "C" in your connect string. Better try to use DBD::Oracle to access Oracle with less overhead and better support for Oracle-specific stuff. =item Oracle 8.x Not tested but known not to support Unicode very well. Quoting the DBD::Oracle man page: "I [...] I" =item MS Access 2000 Unicode tests fail because MS Access 2000 seems not to give any hints about Unicode, so all data is treated as non-Unicode. You do not want to use this combination. You may want to try the MSDE, it has the SQL Server engine, but with a lower connection limit and without GUI tools. There are several 3rd party GUIs for the MSDE. =item PostgreSQL 8.0.3 Some tests from the original DBD::ODBC 1.13 fail with PostgreSQL 8.0.3, so you may not want to use DBD::ODBC to connect to PostgreSQL 8.0.3. Unicode tests fail because PostgreSQL seems not to give any hints about Unicode, so all data is treated as non-Unicode. Better try to use DBD::Pg to access PostgreSQL with less overhead and better support for PostgreSQL-specific stuff. DBD::Pg has a driver attribute named C, set it to 1 and you have proper Unicode support. =back =head2 Tested Operating Systems and ODBC Managers =over 4 =item MS Windows 2000 Professional and Server, using the standard ODBC Manager from Microsoft. =back (Yes, this list should be longer.) =head2 Known Problems Perl 5.8.1 or newer is required. Older Perl before 5.8.0 lacked proper Unicode support. Perl 5.8.0 lacks some auxillary functions for Unicode. Unicode is supported only for SQL statement parameters and data returned by the fetch methods, SQL statements are still treated as native encoding. If you need a unicode constant in an SQL statement, you have to pass it as parameter or use SQL functions to convert your constant from native encoding to Unicode. All data passed to the patched DBD::ODBC for C, C, C, and C is treated as Unicode, even if it is not Unicode. F should check the UTF8 flag of the scalar and pass a value different from C as first argument to C. The problem is to know what encoding is used for the data in the scalar. Binding of unicode output parameters is untested (I don't need them) and likely to fail. The patched DBD::ODBC may fail to compile on non-Win32 platforms. It needs a header file named F defining at least the following: =over 4 =item A C data type capable of storing a single Unicode character. Microsoft uses C in F, and C in F. =item C C for wide characters. Microsoft declares this function in both F and F. =item C C for wide characters, returns character count, not bytes. Microsoft declares this function in both F and F. =item WideCharToMultiByte() and MultiByteToWideChar() functions Encoding converter functions. WideChar in this context means the native Unicode representation of the ODBC API (UCS-2 or UTF-16LE for Windows), MultiByte in this context means Perls native Unicode representation (UTF-8). Microsoft declares the two functions in F. The C argument tells the function that the MultiByte string is in UTF-8 encoding. =back =head2 Technical This patch adds support for C, C, C, and C. Strings written to columns that are reported as C, C, or C are automatically converted to 16 bit Unicode using the Windows API function C, return values reported as C, C, C, or C are converted back to 8 bit Unicode (UTF-8) using the Windows API function C and have Perl's UTF8 flag set except for empty strings. =head2 Tests This patch adds two new tests, F and F. Test 40 checks that Unicode strings can be entered as bound parameters and are returned unmodified. Test 41 creates a table, writes and reads unicode data using various bind variants. When using Oracle, the empty string test in F is skipped because Oracle converts empty strings to NULL in this situation. I had to add C, C, C to F, because Oracle in the setup described above returns Unicode more often than expected. I added F, that exports two utility functions for unicode string tests: =over 4 =item C Dumps a string, indicating its Unicode flag, length and all characters in ASCII notation. =item C Compares two strings that may be contain Unicode, and calls C or C. =back =head2 See also =over 4 =item * Microsoft ODBC documentation =item * Microsoft API documentation =item * http://www.unicode.org/ =item * DBI =item * DBD::ODBC =item * DBD::Oracle =item * DBD::Pg =back