Java Internationalisation (i18n) Character Encoding

Internationalisation (i18n) of java applications should not be difficult, although dealing with text in languages you don’t understand can be a little confusing! As a developer, you’ll normally be sent a translated version of the text to use in your application. If you’re really lucky the translator will be able to work directly with a java property file and you’ll get back a translated version to drop into your application.

A common scenario will be a series of java property files along the lines of;

  • Messages.properties – key/value pairs in base language, assumed to be English
  • Messages_XX.properties – key/value pairs in language with country code XX
  • Messages_XX_YY.properties – key/value pairs in language with country code XX, variant YY

Take the file Messages_ar.properties which contains an Arabic translation. Dropping that directly into your application will probably result in it being ignored, cue head scratching … The issue is that java property files must use character encoding ISO-8859-1 and to have been converted into Arabic the file is probably using character encoding UTF8 (or ISO-8859-6). Sun/Oracle solve this problem using native2ascii as follows (rename your original Arabic translation to Messages_ar-UTF8.properties);

native2ascii -encoding utf8 Messages_ar-UTF8.properties Messages_ar.properties

The resulting Messages_ar.properties file isn’t as readble as the UTF8 version as all values have been converted to unicode – but, at least it now works!

If you want to keep your translations in UTF8 encoded files you need to be using Java 1.5 or greater along with XML based property files.