=pod =begin text +-----------------------------------+ | Lingua::Translit Developer Manual | +-----------------------------------+ =end text =head1 Conventions used in this manual =over 2 =item * Every non absolute path is relative to the source code's directory. =back =head1 Adding a transliteration If you want to add a new transliteration to L just... =over 2 =item * write an XML file (the "transliteration table") =item * build a development version containing your table =item * write and run some tests to check if your transliteration is working as expected. =item * integrate your table into the set of upstream tables and consider contributing it =back =head2 Writing a transliteration table Each XML-file consists of meta data and a set of transliteration rules. The B tags cover the name of the transliteration, a short description and the information whether the transliteration can be used in both directions. For example: DIN 1460 DIN 1460: Cyrillic to Latin true The B can be simple one to one mappings X Y ...but you can also specify a context in which the rule should be evaluated only: A B x y To get an easy start, you can copy the file F, rename it to your needs and edit it right away. =begin html Template: template.xml
Complete example: Common_DEU.xml =end html =begin text Template: xml/template.xml Complete example: xml/common_deu.xml =end text Although editing an XML-file is technically quite easy, some things have to be respected. The most important thing to keep in mind is that the rules are applied I - one after another. Therefore the order of rules is important if you specify a context or transliterate multiple characters. =head3 Unicode notation If you are determining characters that are non-ASCII characters, use an entity that represents the Unicode code point in hex-notation to specify them and leave a comment on the character. А A This assures that the correct character is transformed and it can be exactly determined, if it is not represented correctly. =head3 Specifying a context The context is evaluated as a Perl regular expression. So for specifying the context I, I or I can be used. If a character has two mappings depending on the context, the context-sensitive rule must be applied first and then the context-free rule. Otherwise every character is replaced at once through the context-free rule and the context-sensitive rule will never match. 1. rule Γκ Gk \b 2. rule Γκ Nk The following pattern matching contexts are available: =over 4 =item CafterE> if the transliteration rule should only be applied after a certain character (corresponds to Perl's I) =item CbeforeE> if the rule should only be applied before a certain character (corresponds to Perl's I) =item CafterE E EbeforeE> if the rule should only be applied if the character is in between two characters =back =head3 Multiple characters As all rules are applied in sequence, and hence the order of rules is important, all rules concerning multiple characters must precede all single character rules. 1. rule αυ au 2. rule α a If you switch the order of the rules in the above example, every single alpha would be transliterated first and the digraph pattern will never match. =head2 Building a development version Your new transliteration table has to be converted to a Perl data structure and stored in F in order to be put to use and tested as a development version of L. I is a tool that processes XML transliteration table definitions and converts them to Perl data structures. Normally, all stable transliteration tables are processed once and stored in F and included in the L module once at build time. =head3 Using xml2dump.pl To accomplish this task the I tool comes in handy: alinke$ ./xml2dump.pl -v -o tables.dump mytable.xml Parsing mytable.xml... (MyTable: rules=2, contexts=1) 1 transliteration table(s) dumped to tables.dump. It reads an XML definition, processes it and dumps the resulting data structure to a given file S<(C<-o> switch)>. Your transliteration table is now ready to be included by L so it can be tested and evaluated. =head3 Building a temporary Lingua::Translit Use the standard toolchain to build a temporary development version of L which contains nothing but your new transliteration table. alinke$ perl Makefile.PL && make Given the resulting development version, it's time to test the transliteration table for completeness and correct functionality. =head2 Testing the transliteration table To verify that your set of transliteration rules works correctly, you need to make some tests using your favorite Perl test framework. For an easy and complete example that utilizes the L framework, have a look at the following example: =begin html t/11_tr_Common_DEU.t =end html L comes with a ready to use test template that you could use as a starting point and suite it to your transliterations specific needs. It is located at F - to follow L's naming convention, rename it to F. =begin html Online version of the template: t/xx_tr_template.t.pl =end html =begin text t/11_tr_Common_DEU.t =end text =head3 Hints on what to test =over 2 =item * If your transliteration is straight forward (only "1:1" mappings), just test a small text and have a look at the result. At best, everything is correct an you are ready. =item * If the transliteration is reversible, you should check, if both directions are transliterating correctly. =item * All the context-sensitive and multi-character transliterations should be tested explicitly, to assure, that the error-prone mappings also work as expected. =back =head3 Running the Tests While testing it is convenient to define the environment variable I (have a look at I(1)) so that the Perl interpreter C where your development version of L is located. The following example session assumes that you are using I(1) or a similar shell: alinke$ export PERL5LIB="blib/lib" alinke$ perl t/66_tr_mytest.t 1..2 ok 1 - MyTable: not reversible ok 2 - MyTable: transliteration If all tests work as expected and hence your transliteration table is ready for usage, clean up your shell's environment and prepare to integrate your table into the existing set of transliteration tables: alinke$ unset PERL5LIB =head2 Integrating the new table Change to the F directory and let I(1) call I in order to build a data structure ("F") from all available XML transliteration tables, including yours: alinke$ make all-tables Now, clean up the old files from the development version you used to write your tests. Change into the source directory's root and run alinke$ make distclean && perl Makefile.PL && make The result is a complete version of L that contains all upstream tables, as well as your own addition. alinke$ make test ...assures everything is alright and ready for installation or packaging. Congratulations! =head2 Contributing your table If you like to contribute your transliteration table under the terms of the GPL/Artistic License, it can be included in the official upstream version. To accomplish this, create a patch of your changes and send it along with a description and comments to C so it can be part of the next release. =begin html =end html =cut # vim: sts=2 enc=utf-8 textwidth=72 wrap