Continuing my series on Kohana 3 (see Auth here and Validation here), I'm tackling Kohana 3 internationalization in this post.

How Kohana 3 I18n works

Kohana 3's i18n functionality is implemented in two files:

  • /system/classes/i18n.php - Implements the i18n class, which provides all the functionality.
  • /system/base.php - Implements the __() function, which is most commonly used for translating strings.
The Kohana_I18n class provides the following functions:

  • I18n::lang($lang = NULL). Gets or sets the target language for translation. Call without parameters to get the current target language to which strings will be translated to.
  • I18n::get($string, $lang = NULL). Returns the translation of the source string, optionally specifying what the target language should be. You should not usually use this function, however it has some uses (discussed later).
  • I18n::load($lang). Loads the translation table for a given language, caches it and returns it.
The base file provides one translation function, the double underscore:

  • ($string, $values, $lang = 'en-us'). When the current target language is the same as the param $lang, returns the same string. When the current target language is something else, returns the translated string if it exists, or the same string if it does not.The $values array determines replacements made in the string.
() is the main function you should be using. Just wrap all your strings into it.

Storing and loading the translation strings

As you notice if you look at the source, there are no translation strings here. They are stored under /application/i18n/languagename.php, where languagename is the name of the language you want to use. Kohana uses the string "en-us" as the default target language.

Note how the default language string has two parts: one specifying the language (en for English) and another for the region (us for the US).

Loading is done automatically on-demand when you call __(). The loading function I18n::load() works in a way that it can flexibly search for a file. It explodes the string on the "-" character, so the default target language "en-us" results in a search for the following files:

  • /application/i18n/en.php
  • /application/i18n/en/us.php
Note that the order means that you can have a single language file for English, and then override some parts of the file - for example with the target language string "en-gb", Kohana would first load the en.php file and then the /en/gb.php file; this makes it easy to have regional variants for strings ("color" vs. "colour").

What should I store in the i18n language files?

This is an important question, and Kohana does not make it for you. There are basically two approaches:

Option 1: Storing identifiers

// in View
echo __('about.description');
// in /i18n/latin.php
return array(
    'about.description' => 'Lorem ipsum dolor amet...',
);

Option 2: Storing the translated string itself

// in View
echo __('This is a sample text...');
// in /i18n/latin.php
return array(
    'This is a sample text...' => 'Lorem ipsum dolor amet...',
);

Which one is better? This is a matter of preference, and I strongly advise that you store the translated string itself. I absolutely HATE storing identifiers, having done that and worked with some applications that store identifiers. shadowhand (Kohana's benevolent dictator) agrees, see this discussion and this discussion.

The problem with identifiers is that they make you remember a million different strings and hunt around for the right string to change in your file. Your translators have to remember what "about.description" meant when they translate your files, and you have to remember what is behind "about.description" when you make code changes. This is a maintenance nightmare, and if you forget to translate a string, the user will see a cryptic identifier.

The pros of identifiers are that the translation files are slightly smaller and use slightly less memory, and that you can change the translation files without changing the code. However, these are in my opionion rather meager advantages compared to the maintenance problems that are created.

Storing the string itself is better, because it allows you to see the text in your files and allows you to edit the translation in-place. This 1) allows for direct editing, and 2) makes it obvious that a new translation is needed since the English text is shown, and 3) if you forget to translate a string, the user will see something in English rather than just a identifier.

The nice thing is that if you write your application in the default target language, no I18n calls are performed when you use ()! So you don't even pay the cost of a lookup. You do not need to create a translation file for the default language, because strings in the default language will be returned as-is by ().

TL;DR: Store the translated string itself.

How do I tell Kohana I want to translate a string?

Use ('This is my string'). If the string is in a language other than en-us, then use ('String', null, 'language'); you will then need to have a /application/i18n/en.php file for the English equivalent.

If you need to replace items in the string, do something like:

echo __('Dear :firstname, your username is: :user', array(
      ':firstname' => $user->name,
      ':user' => $user->username,
      ));</pre>

How do I start collecting the i18n strings?

Put a file like this as /application/languagename.php:

&lt;?php defined('SYSPATH') or die('No direct script access.');

return array(
    'Hello World' =&gt; 'Terve maailma',
);

and start writing the English-to-your-language strings.

Now that you know how to do that - don't do it manually! Instead take a look at my automatic I18n string collector, which extends I18n and automatically detects missing translations (for whatever language is currently set as the target), and keeps updating your translation files. It saves some serious amounts of time.

How do I allow the user to switch the language dynamically?

Use a cookie to store the user language, then load the value in /application/bootstrap.php:

// default value for the cookie is 'fi' for Finnish
$lang = Cookie::get('lang', 'fi');
if(!in_array($lang, array('fi', 'sv', 'en-us'))) {
   // check the allowed languages, and force the default
   $lang = 'fi';
}
// set the target language
i18n::lang($lang);

Then provide some sort of interface to changing the cookie in a Controller:

function action_change_language($lang) {
   if(!in_array($lang, array('fi', 'sv', 'en-us'))) {
      $lang = 'fi';
   }
   Cookie::set('lang', $lang);
   I18n::lang($lang);
   Request::instance()-&gt;redirect('page/index');
}

The link would look something like Html::anchor('/controller/change_language/fi', 'Finnish') to change the language.

Comments

Alex: I think storing phrases in original language as a key isn't the best practice because it turns changing it into a pain in the ass: it's a key, so it should be changed in all translations. So, it's not just about changing source to change translation: it's about changing code and all other locales.

Mikito Takada: I guess it depends on what kind of pain you want.

Since I usually write the original language version and someone else maintains the translation, I prefer that changing the string directly where it is used in the code.

This is easier for code maintenance and a bit harder for translation file maintenance, but translation time is cheaper than coding time.

I think it is good that changing the original language string forces you to also add a new translation for that text in all other languages. If the string such as a help message is changed, then it is different and should have a different translation as well.

Alex: It makes developer thinking twice about any phrase being written. It's not possible to type just about.description or so and forward the text work to copyrighter or whoever writing texts and whose time (as you correctly mentioned) is cheaper. So usage of this approach, I think, is very limited. I mean much more limited than storing partially-faceless identifiers.

Just imagine that native language is not English. That original translation could say even less that [usually] English-speaking identifiers that [usually] stores the essential meaning, short but clear. Then, is it a problem to return a string from default translation which is [usually] complete? Yeah, the case it's incomplete brings us to the prime argument of 'identifiers vs complete strings' discussion.

I sincerely tried to deploy the approach in a quite a large project and I failed. The only 'bonus' that stayed unbeaten is that English version comes 'for free' — I mean either resourses for translation during request processing and (possibly) translation by the project's author.

But I believe it meets more cons than pros for those developers who speak languages widespread much less than English but still have an intension to make a slightly internationalized website (not internationalized, just translated actually).

So I came to the point than for described kind of tasks using that loathsome identifiers is an appropriate price.

Logan: I'd say overall it's gonna be a pain either way, so you need to decide which method will be best for the project.

Both methods seem to have clear advantages and disadvantages, but the context in which you use either method should be considered. Perhaps if the original phrases can come from a variety of languages it makes more sense to go they key -> value route, but if you are getting a handoff of templates that are already in english with the copy already there, it makes sense to just wrap them in translation functions instead of moving all the copy out into a language file - probably makes sense to be aware of how either method can be useful

Artjom Kurapov: I would argue for using identifiers. Although using direct text may look nice and easy, don't forget that if you can easily have two different meanings for single word and even more different meanings in other languages with different context. So using same text is not an option and you need identifiers to differentiate different meanings. And let's face it, if you do make a multilingual system, then that means that its already big and complex enough.

Sure, developers are always rushing to add new translation with as less thinking as possible, but identifiers should be long and well structured for everyone to understand where on site it is used

gps: Greate tutorial! I'm follow your instructions to transform my website to a multi-language website, but I dont know to to make a default language that use http://mydomain.com, and another will use http://mydomain.com/{lang-code} Thanks for reading and please help!