The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

=head1 Template-Plugin-Gettext

Localization for the Template Toolkit 2


=head2 Description

This Perl library offers an end-to-end localization and internationalization solution for the Template Toolkit 2.
It consists of a plugin that offers translation functions inside templates
and a string extractor C<xgettext-tt2> that extracts translatable strings
from templates and writes them to PO files (or rather a C<.pot> file in PO
format).  The string extractor C<xgettext-tt2> is fully
customizable and also usable for other i18n plugins or
frameworks for the Template Toolkit.


=head2 Usage

The solution offered by this library is suitable for templates that have 
a lot of markup (normally HTML) compared to text.  If the files contain
a lot of content other solutions are probably more suitable.  One of them
is L<xml2po|https://github.com/mate-desktop/mate-doc-utils/tree/master/xml2po>,
especially if the input format is HTML.

If the input format is Markdown, for example for a static side generator,
a feasible approach may be to simply split the input into paragraphs, and
turn each paragraph into an entry of a PO file.

In the following, we will assume that you have decided to localize
templates with this library.


=head3 Templates

The first step is to mark all translatable strings.  This serves
a double purpose.  Strings are marked, so that the extractor 
C<xgettext-tt2> can find them and write them into a translation file 
in PO format.

The second purpose is that these markers are also valid functions
resp. filters for the template toolkit and will interpolate the
translations for these messages into the output, when rendering the
template.  As a result, your templates remain pretty readable after
localizing them.

In every source file that you want to use translations, you have
to C<USE> the template:

    [% USE gtx = Gettext('com.mydomain.www', 'fr') %]

Do I<not> forget to C<USE> the plug-in in all templates!  The template
toolkit will not warn you, when you forget it but the translation 
mechanism will not work!

The first argument is the so-called I<textdomain>.  This is the
identifier for your message catalogs and also the basename of several
files.  In the example above, the translated message catalog would
be searched as I<C<LOCALEDIR>>C</fr/LC_MESSAGES/com.mydomain.www.mo>. The second parameter is the language.  This will normally come from
a variable instead of a hard-coded string.

A possible third argument (omitted in the example) is the character
set to use, all following arguments are additional directories to
search first for translations.

The default list of directories is:

=over

=item *

C<./locale>


=item *

C</usr/share/locale>


=item *

C</usr/local/share/locale>


=back

The directory C<./locale> is relative to the current working directory
from where you invoke the template processor.


=head4 Simple Translations With C<gettext()>

The simplest and most common way of doing things is:


 [% USE gtx = Gettext('com.mydomain.www', lang) %]
 
 <title>[% gtx.gettext("World Of Themes") %]</title>
     
 <h1>[% "Introduction" | gettext %]
 
 <p>
 [% FILTER gettext %]
 The "World Of Themes" is the ultimate source of templates
 for the Template Toolkit.
 [% END %]
 </p>
This shows three different ways of localizing strings.  You can
use the function C<gtx.gettext()>, the filter C<gettext> with pipe
syntax, or the same filter with block syntax.  The result is always
the same.  The string will be recognized as translatable by 
C<xgettext-tt2> and it will be translated into the selected language,
when rendering the template.


=head4 Interpolating Strings Into Translations

One important thing to understand is that the argument to the
gettext functions or filters is the lookup key into the translation
database, when the template gets rendered.  That implies that this
key has to be invariable and must not use any interpolated variables.


 [% USE gtx = Gettext('com.mydomain.www', lang) %]
 
 [% gtx.gettext("Hello, $firstname $lastname!") %]
This template code is syntactically correct and will also render
correctly.  But C<xgettext-tt2> will bail out on it with an error
message like

    templates.html:3: Illegal variable interpolation at "$"

The function C<gettext()> will receive the interpolated string
as its argument, and that is not the same as the string that
the extractor program C<xgettext-tt2> sees.  And that means that
the translation cannot be found.

The correct way to interpolate strings uses C<xgettext()>:


 [% USE gtx = Gettext('com.mydomain.www', lang) %]
 
 [% gtx.xgettext("Hello, {first} {last}!",
                 first => firstname, last => lastname) %]
 [% "Hello, {first} {last}!" | xgettext(first => firstname, 
                                        last => lastname) %]
 [% FILTER xgettext(first => firstname, last => lastname) %]
 Hello, {first} {last}!
 [% END %]
One additional benefit of this is that the extractor program
C<xgettext-tt2> will also mark these strings with the flag
"perl-brace-format".  When the translation from the C<.po>
file gets compiled into an C<.mo> file, the compiler C<msgfmt>
checks that the translated strings contains exactly the same
placeholders as the original.

One thing that you should also avoid is to assemble strings
in the template source code.  Do I<not>:


 [% gtx.gettext("Please contact") %] [% name %]
 [% gtx.gettext("for help about the") %] [% package %]
 [% gtx.gettext("software.") %]
This will result in three translatable text snippets
"Please contact", "for help about the", and "software." that
are hard to translate without context.  Besides it makes
illegal assumptions about the word order in translated sentences.
Instead, use C<xgettext()> and write in complete sentences with
placeholders.

By the way, the C<x> in the function C<xgettext()> stands for I<eXpand>
while the C<x> in the program C<xgettext-tt2> or GNU Gettext's
C<xgettext> program stands for I<eXtract>.


=head4 Plural Forms

Do I<not> write this:


 [% IF num != 1 %]
 [% gtx.xgettext("{number} documents deleted!", number => num) %]
 [% ELSE %]
 [% gtx.gettext("One document deleted!") %]
 [% END %]
This assumes that every language has one singular and one plural
(and no other forms) and that the condition that selects the correct
form is always C<COUNT != 1>.  But this is wrong for many languages
for example Russian (two plural forms), Chinese (no plural), French
(different condition), and many more.

Write instead:


 [% USE gtx = Gettext('com.mydomain.www', lang) %]
 
 [% gtx.nxgettext("One document deleted.", 
                  "{count} documents deleted."
                  num,
                  count => num) %]
The function C<nxgettext()> receives the singular and plural
form as the first and second argument, followed by the number
of items, followed by an arbitrary number of key/value pairs
for interpolating variables in the strings.

There is also a function C<ngettext()> that does not expand
its first two arguments.  You will find out that you almost
never need that function.

You can also use C<nxgettext()> and C<ngettext()> as filters.
But the necessary code is awkward, and their use is therefore
not recommended.


=head4 Ambiguous Strings (message contexts)

Sometimes an English string has different meanings in other
languages:


 [% USE gtx = Gettext('com.mydomain.www', lang) %]
 
 [% gtx.gettext("State:") %]
 [% IF state == '1' %]
 [% gtx.pgettext("state", "Open") %]
 [% ELSE %]
 [% gtx.gettext("Closed") %]
 [% END %]
 <a href="/action/open">[% gtx.pgettext("action", "Open") %]</a>
The function C<pgettext()> works like gettext but has one 
extra argument preceding the string, the so-called
message context.  The string extractor C<xgettext-tt2> will now
create two distinct messages "Open", one with the context "state",
the other one with the context "action".  The sole purpose of this
context is to disambiguate the string "Open" for languages where the
verb ("to open") and the adjective ("the door is I<open>") has
two distinct translations.

You will normally use this function, when a translator asks you
to do so, but not on your own behalf.

There is also a function C<pxgettext()> that supports placeholder
interpolation, and C<npxgettext()> that has the following semantics:


 npxgettext(CONTEXT, SINGULAR, PLURAL, COUNT,
            KEY1 => VALUE1, KEY2 => VALUE2, ...)

=head4 More Esoteric Functions

The L<API documentation|lib/Template/Plugin/Gettext.pod#user-content-FUNCTIONS> contains
some more functions and filters that are available for completeness.
You will never need them in normal projects.


=head4 Translator Hints

You can add comments to the source code that are copied into the
C<.po> file as hints for the translators.  This will look like
this:


 [% USE gtx = Gettext('com.mydomain.www', lang) %]
 
 <!-- TRANSLATORS: This is the day of the week! -->
 [% gtx.gettext("Sun") %]
In order to make that work, you have to invoke the extractor
program C<xgettext-tt2> like this:

    xgettext-tt2 --add-comments=TRANSLATORS: t1.html t2.html ...


=head4 Modifying Flags

In rare situations, you may need the following:


 [% USE gtx = Gettext('com.mydomain.www', lang) %]
 
 <!-- xgettext:no-perl-brace-format -->
 [% gtx.xgettext("Value: {value}", value => whatever) %]
Normally, the argument of C<xgettext()> will be flagged in
the C<.po> file with "perl-brace-format", and a translation
will fail to compile if the translation does not contain exactly
the same placeholders as the original does.

You can override that default behavior for individual messages
by placing a comment containing the string "xgettext:" directly
in front of the string.


=head3 Translation Workflow

The translation workflow is the standard workflow known from GNU 
Gettext.  All files relevant for translations are conventionally
kept in a subdirectory C<po>.

You can save time if you use the seed project
L<Template-Plugin-Gettext-Seed|https://github.com/gflohr/Template-Plugin-Gettext-Seed>
as a base.  It contains a directory C<po> ready for use,
with --- at your choice --- a Makefile or a script C<po-make.pl>
that automates the entire translation workflow.  It is also
prepared for extracting strings from other sources than
template files.  In that example, these are Perl source files,
but it will work in a similar fashion for other programming
languages.

But rolling your own version is also simple.  Just read on.


=head4 Extracting Strings With C<xgettext-tt2>

Extracting translatable strings from templates for the Template
Toolkit 2 is as easy as:


 $ xgettext-tt2 TEMPLATE....
This will scan all files given as arguments for translatable strings
and create a file C<messages.po> with the strings found.

The normal invocation of C<xgettext-tt2> is normally a little bit more
sophisticated:


 $ xgettext-tt2 --files-from=POTFILES \
     --output=com.mydomain.www.pot \
     --add-comments=TRANSLATORS: --from-code=utf-8 \
     --force-po
You can, of course, write everyting in one line and omit the backslashes.

Specifying all input files as arguments on the command-line can
quickly become unwieldy.  It is more common to put the list of input
files into a text file, each input file on one line, and instruct
C<xgettext-tt2> to read it with the option C<--files-from>.  The name
of the file is by convention C<POTFILES>.

The output file is normally a file C<TEXTDOMAIN.pot>, where 
C<TEXTDOMAION> is the identifier selected in the templates.  The
reverse hostname of the server serving the rendered templates
is a good choice.

If you want to be able to give hints to translators in the source
files, you have to specify the trigger string --- normally
"TRANSLATORS:" --- with the option C<--add-comments>.  Specifying
an empty string (C<--add-comments=''>) instructs C<xgettext-tt2> 
to copy all comments into the C<.pot> file.

If your templates contain characters outside of US-ASCII, you should
specify the character set of the template files with the option
C<--from-code=CODESET>.

The option C<--force-po> instructs C<xgettext-tt2> to write an output
file even if no translatable strings had been found.  But this
is a matter of taste.  Omit the option, if you prefer it.

C<xgettext-tt2> has a lot more options.  They are mostly compatible
with the ones of C<xgettext> from GNU gettext for C, Perl, and
a lot more languages.  See the L<documentation for GNU Gettext's
xgettext|https://www.gnu.org/software/gettext/manual/html_node/xgettext-Invocation.html>
and the documentation for L<Locale::XGettext|https://github.com/gflohr/Locale-XGettext/blob/master/lib/Locale/XGettext.pod>
for more information.

By the way, why is the ouput file a C<.pot> file and not a C<.po>
file?  It is the I<template> for the C<.po> files for the individual
languages.  You never edit that file, but re-generate it, whenever
the source files have changed.  Hence, it only contains strings
in the original, in the base language.


=head4 Creating Translation Files

For each supported language (except for the base language) you
should create a file C<LL.po>, where C<LL> is the two-letter
language code for that language, for example C<fr.po>, C<de.po>,
or C<it.po>.  You can also specify the combination of language
and country like in C<de_DE.po> or C<pt_BR.po>.

One option for that is to simply copy the C<.pot> file and
edit the header accordingly.  It is normally easier to do that
with the program C<msginit>:


 $ msginit --input=com.mydomain.www.pot --locale=fr
Replace C<TEXTDOMAIN.pot> with the name of the C<.pot> file, and
C<fr> with the language in question.  This will prefill a lot
of fields in the C<.po> file.


=head4 Compiling Translation Files

The translated C<.po> files are compiled with the program C<msgfmt>:


 $ msgfmt --check --statistics --verbose -o fr.mo fr.po
 fr.po: 212 translated messages, 1 fuzzy translation, 3 untranslated messages.
This will compile the translation file C<fr.po> into a binary
file C<fr.mo>.  It also checks the translations for formal errors
and print statistics about the number of translated and
untranslated strings.


=head4 Installing Translation Files

The plugin does not use C<.po> files for looking up translations
but the binary C<.mo> files.  But it has to find them.

You have to decide for one of the directories that 
C<Template::Plugin::Gettext> searches for translations.  The
default order is:

=over

=item *

C<@INC/LocaleData>


=item *

C</usr/share/locale>


=item *

C</usr/local/share/locale>


=back

The first line means that every directory C<LocaleDir> inside
Perl's include directories is searched for translation files.
Keep in mind, that for security reasons the current directory
(C<.>) is nowadays often I<not> in Perl's C<@INC>.

Let's assume that C</var/www/lib> is in Perl's @INC.  You would
then install the French translation file C<fr.mo> as C</var/www/lib/LocaleData/fr/LC_MESSAGES/com.mydomain.www.mo>.
C<TEXTDOMAIN> is a placeholder for the textdomain you have
selected (and C<LC_MESSAGES> is I<not> a placeholder but a real 
directory name).

That is good except for the fact that C</var/www/lib> is usually
not in Perl's C<@INC>.  But you can change that where you invoke
the template processor:


 BEGIN {
     unshift @INC, '/var/www/lib';
 }
 
 use Template;
 
 Template->new->process('template.html', $data);
You can completely override the default search order in the
templates:


 [% USE gtx = Gettext('com.mydomain.www', lang, 'utf-8', 
                      '/var/www/locale', '/srv/www/locale')]
Now, the French translation would be searched in 
C</var/www/locale/fr/LC_MESSAGES/com.mydomain.www.mo> and 
C</var/www/locale/fr/LC_MESSAGES/com.mydomain.www.mo>.


=head4 Updating Translation Files

Translations may become obsolete, when the source templates
change.  In this case, you have to merge the new set of 
translatable strings into the existing translation files.
Fortunately, GNU Gettext makes this easy:


 $ xgettext-tt2 --files-from=POTFILES \
     --output=com.mydomain.www.pot \
     --add-comments=TRANSLATORS: --from-code=utf-8 \
     --force-po
 $ cp fr.po fr.old.po
 $ msgmerge fr.old.po com.mydomain.www.pot -o fr.po
 ....... done
You first update the C<.pot> file with C<xgettext-tt2> so that it
contains the current set of translatable strings.  You then
make a backup of each C<.po> file and then invoke the program
C<msgmerge> for merging the current translations from C<fr.old.po>
with the new set of strings from C<com.mydomain.www.pot> into
the updated translation file C<fr.po>.

The file C<fr.po> will now contain the new strings as untranslated
entries.  Strings that have only slightly change will retain their
translations but they will be marked as "fuzzy", so that they
can be reviewed by a translator.  Entries for strings that are
no longer present in the sources are obsoleted.


=head4 Integrating With Other Programming Languages

The GNU Gettext framework is available for a lot of programming
languages and it is not uncommon that two or more of these 
languages are mixed in a project.  It is beneficial in these
cases to use a common translation base for all used 
technologies.

C<xgettext-tt2> is based on L<C<Locale-XGettext>|https://github.com/gflohr/Locale-XGettext>
and therefore not only understands Template Toolkit templates
but also C<.po> and C<.pot> files as input.  GNU Gettext's xgettext
has the same feature.

Accumulating all translatable strings from the different
technologies is therefore very easy.  If you have a project
that uses Template Toolkit for rendering web pages and Perl
for the business logic you first extract strings from your 
Perl files --- as usual --- with C<xgettext> from GNU gettext
into a temporary file, for example C<plfiles.pot>.  Then you
extract the strings from the templates with C<xgettext-tt2> 
from this library, but you specify C<plfiles.pot> as an 
additional input file. And now the output file of C<xgettext-tt2>
contains all the strings from the template files I<plus> those
from the Perl files in C<plfiles.pot>.

Of course, you can also do it the other way round, extract
with C<xgettext-tt2> into C<ttfiles.pot>, and then feed that as
an additional input file to GNU Gettext's C<xgettext>.

You can use the seed project L<Template-Plugin-Gettext-Seed|https://github.com/gflohr/Template-Plugin-Gettext-Seed>
as a fully functional starting point for such setups.


=head2 Bugs

Please report bugs at L<https://github.com/gflohr/Template-Plugin-Gettext/issues|https://github.com/gflohr/Template-Plugin-Gettext/issues>


=head2 Author

Template-Plugin-Gettext was written by L<Guido Flohr|http://www.guido-flohr.net/>.