+-----------------------------------+ | Lingua::Translit Developer Manual | +-----------------------------------+ ==== Conventions used in this manual ==== : * Every non absolute path is relative to the source code's directory. ==== Adding a transliteration ==== If you want to add a new transliteration to Lingua::Translit just... : * write an XML file (the "transliteration table") : * build a development version containing your table : * write and run some tests to check if your transliteration is working as expected. : * integrate your table into the set of upstream tables and consider contributing it == Writing a transliteration table == Each XML-file consists of meta data and a set of transliteration rules. The meta data tags cover the name of the transliteration, a short description and the information whether the transliteration can be used in both directions. For example: DIN 1460 DIN 1460: Cyrillic to Latin true The rules can be simple one to one mappings X Y ...but you can also specify a context in which the rule should be evaluated only: A B x y To get an easy start, you can copy the file xml/template.xml, rename it to your needs and edit it right away. Template: xml/template.xml Complete example: xml/common_deu.xml Although editing an XML-file is technically quite easy, some things have to be respected. The most important thing to keep in mind is that the rules are applied *in sequence* - one after another. Therefore the order of rules is important if you specify a context or transliterate multiple characters. = Unicode notation = If you are determining characters that are non-ASCII characters, use an entity that represents the Unicode code point in hex-notation to specify them and leave a comment on the character. А A This assures that the correct character is transformed and it can be exactly determined, if it is not represented correctly. = Specifying a context = The context is evaluated as a Perl regular expression. So for specifying the context *literal ASCII characters*, *entities* or *meta characters* can be used. If a character has two mappings depending on the context, the context-sensitive rule must be applied first and then the context-free rule. Otherwise every character is replaced at once through the context-free rule and the context-sensitive rule will never match. 1. rule Γκ Gk \b 2. rule Γκ Nk The following pattern matching contexts are available: : ``'' if the transliteration rule should only be applied after a certain character (corresponds to Perl's *lookbehind*) : ``'' if the rule should only be applied before a certain character (corresponds to Perl's *lookahead*) : `` & '' if the rule should only be applied if the character is in between two characters = Multiple characters = As all rules are applied in sequence, and hence the order of rules is important, all rules concerning multiple characters must precede all single character rules. 1. rule αυ au 2. rule α a If you switch the order of the rules in the above example, every single alpha would be transliterated first and the digraph pattern will never match. == Building a development version == Your new transliteration table has to be converted to a Perl data structure and stored in xml/tables.dump in order to be put to use and tested as a development version of Lingua::Translit. *xml2dump.pl* is a tool that processes XML transliteration table definitions and converts them to Perl data structures. Normally, all stable transliteration tables are processed once and stored in xml/tables.dump and included in the Lingua::Translit::Tables module once at build time. = Using xml2dump.pl = To accomplish this task the *xml2dump.pl* tool comes in handy: alinke$ ./xml2dump.pl -v -o tables.dump mytable.xml Parsing mytable.xml... (MyTable: rules=2, contexts=1) 1 transliteration table(s) dumped to tables.dump. It reads an XML definition, processes it and dumps the resulting data structure to a given file (``-o'' switch). Your transliteration table is now ready to be included by Lingua::Translit::Tables so it can be tested and evaluated. = Building a temporary Lingua::Translit = Use the standard toolchain to build a temporary development version of Lingua::Translit which contains nothing but your new transliteration table. alinke$ perl Makefile.PL && make Given the resulting development version, it's time to test the transliteration table for completeness and correct functionality. == Testing the transliteration table == To verify that your set of transliteration rules works correctly, you need to make some tests using your favorite Perl test framework. For an easy and complete example that utilizes the Test::More framework, have a look at the following example: Lingua::Translit comes with a ready to use test template that you could use as a starting point and suite it to your transliterations specific needs. It is located at t/xx_tr_template.t.pl - to follow Lingua::Translit's naming convention, rename it to NN_tr_NAME.t. t/11_tr_Common_DEU.t = Hints on what to test = : * If your transliteration is straight forward (only "1:1" mappings), just test a small text and have a look at the result. At best, everything is correct an you are ready. : * If the transliteration is reversible, you should check, if both directions are transliterating correctly. : * All the context-sensitive and multi-character transliterations should be tested explicitly, to assure, that the error-prone mappings also work as expected. = Running the Tests = While testing it is convenient to define the environment variable *PERL5LIB* (have a look at *perlrun*(1)) so that the Perl interpreter ``knows'' where your development version of Lingua::Translit is located. The following example session assumes that you are using *bash*(1) or a similar shell: alinke$ export PERL5LIB="blib/lib" alinke$ perl t/66_tr_mytest.t 1..2 ok 1 - MyTable: not reversible ok 2 - MyTable: transliteration If all tests work as expected and hence your transliteration table is ready for usage, clean up your shell's environment and prepare to integrate your table into the existing set of transliteration tables: alinke$ unset PERL5LIB == Integrating the new table == Change to the xml/ directory and let *make*(1) call *xml2dump.pl* in order to build a data structure ("tables.dump") from all available XML transliteration tables, including yours: alinke$ make all-tables Now, clean up the old files from the development version you used to write your tests. Change into the source directory's root and run alinke$ make distclean && perl Makefile.PL && make The result is a complete version of Lingua::Translit that contains all upstream tables, as well as your own addition. alinke$ make test ...assures everything is alright and ready for installation or packaging. Congratulations! == Contributing your table == If you like to contribute your transliteration table under the terms of the GPL/Artistic License, it can be included in the official upstream version. To accomplish this, create a patch of your changes and send it along with a description and comments to ``perl@lingua-systems.com'' so it can be part of the next release.