+-----------------------------------+
| Lingua::Translit Developer Manual |
+-----------------------------------+
==== Conventions used in this manual ====
: * Every non absolute path is relative to the source code's directory.
==== Adding a transliteration ====
If you want to add a new transliteration to Lingua::Translit just...
: * write an XML file (the "transliteration table")
: * build a development version containing your table
: * write and run some tests to check if your transliteration is working as
expected.
: * integrate your table into the set of upstream tables and consider
contributing it
== Writing a transliteration table ==
Each XML-file consists of meta data and a set of transliteration rules.
The meta data tags cover the name of the transliteration, a short
description and the information whether the transliteration can be used in
both directions. For example:
DIN 1460
DIN 1460: Cyrillic to Latin
true
The rules can be simple one to one mappings
X
Y
...but you can also specify a context in which the rule should be
evaluated only:
A
B
x
y
To get an easy start, you can copy the file xml/template.xml, rename it to
your needs and edit it right away.
Template: xml/template.xml
Complete example: xml/common_deu.xml
Although editing an XML-file is technically quite easy, some things have
to be respected. The most important thing to keep in mind is that the
rules are applied *in sequence* - one after another. Therefore the order
of rules is important if you specify a context or transliterate multiple
characters.
= Unicode notation =
If you are determining characters that are non-ASCII characters, use an
entity that represents the Unicode code point in hex-notation to specify
them and leave a comment on the character.
А
A
This assures that the correct character is transformed and it can be
exactly determined, if it is not represented correctly.
= Specifying a context =
The context is evaluated as a Perl regular expression. So for specifying
the context *literal ASCII characters*, *entities* or *meta characters*
can be used.
If a character has two mappings depending on the context, the
context-sensitive rule must be applied first and then the context-free
rule. Otherwise every character is replaced at once through the
context-free rule and the context-sensitive rule will never match.
1. rule
Γκ
Gk
\b
2. rule
Γκ
Nk
The following pattern matching contexts are available:
: ``''
if the transliteration rule should only be applied after a certain
character (corresponds to Perl's *lookbehind*)
: ``''
if the rule should only be applied before a certain character
(corresponds to Perl's *lookahead*)
: `` & ''
if the rule should only be applied if the character is in between two
characters
= Multiple characters =
As all rules are applied in sequence, and hence the order of rules is
important, all rules concerning multiple characters must precede all
single character rules.
1. rule
αυ
au
2. rule
α
a
If you switch the order of the rules in the above example, every single
alpha would be transliterated first and the digraph pattern will never
match.
== Building a development version ==
Your new transliteration table has to be converted to a Perl data
structure and stored in xml/tables.dump in order to be put to use and
tested as a development version of Lingua::Translit.
*xml2dump.pl* is a tool that processes XML transliteration table
definitions and converts them to Perl data structures. Normally, all
stable transliteration tables are processed once and stored in
xml/tables.dump and included in the Lingua::Translit::Tables module once
at build time.
= Using xml2dump.pl =
To accomplish this task the *xml2dump.pl* tool comes in handy:
alinke$ ./xml2dump.pl -v -o tables.dump mytable.xml
Parsing mytable.xml... (MyTable: rules=2, contexts=1)
1 transliteration table(s) dumped to tables.dump.
It reads an XML definition, processes it and dumps the resulting data
structure to a given file (``-o'' switch).
Your transliteration table is now ready to be included by
Lingua::Translit::Tables so it can be tested and evaluated.
= Building a temporary Lingua::Translit =
Use the standard toolchain to build a temporary development version of
Lingua::Translit which contains nothing but your new transliteration
table.
alinke$ perl Makefile.PL && make
Given the resulting development version, it's time to test the
transliteration table for completeness and correct functionality.
== Testing the transliteration table ==
To verify that your set of transliteration rules works correctly, you need
to make some tests using your favorite Perl test framework. For an easy
and complete example that utilizes the Test::More framework, have a look
at the following example:
Lingua::Translit comes with a ready to use test template that you could
use as a starting point and suite it to your transliterations specific
needs. It is located at t/xx_tr_template.t.pl - to follow
Lingua::Translit's naming convention, rename it to NN_tr_NAME.t.
t/11_tr_Common_DEU.t
= Hints on what to test =
: * If your transliteration is straight forward (only "1:1" mappings), just
test a small text and have a look at the result. At best, everything is
correct an you are ready.
: * If the transliteration is reversible, you should check, if both
directions are transliterating correctly.
: * All the context-sensitive and multi-character transliterations should be
tested explicitly, to assure, that the error-prone mappings also work as
expected.
= Running the Tests =
While testing it is convenient to define the environment variable
*PERL5LIB* (have a look at *perlrun*(1)) so that the Perl interpreter
``knows'' where your development version of Lingua::Translit is located.
The following example session assumes that you are using *bash*(1) or a
similar shell:
alinke$ export PERL5LIB="blib/lib"
alinke$ perl t/66_tr_mytest.t
1..2
ok 1 - MyTable: not reversible
ok 2 - MyTable: transliteration
If all tests work as expected and hence your transliteration table is
ready for usage, clean up your shell's environment and prepare to
integrate your table into the existing set of transliteration tables:
alinke$ unset PERL5LIB
== Integrating the new table ==
Change to the xml/ directory and let *make*(1) call *xml2dump.pl* in order
to build a data structure ("tables.dump") from all available XML
transliteration tables, including yours:
alinke$ make all-tables
Now, clean up the old files from the development version you used to write
your tests. Change into the source directory's root and run
alinke$ make distclean && perl Makefile.PL && make
The result is a complete version of Lingua::Translit that contains all
upstream tables, as well as your own addition.
alinke$ make test
...assures everything is alright and ready for installation or packaging.
Congratulations!
== Contributing your table ==
If you like to contribute your transliteration table under the terms of
the GPL/Artistic License, it can be included in the official upstream
version. To accomplish this, create a patch of your changes and send it
along with a description and comments to ``perl@lingua-systems.com'' so it
can be part of the next release.