Java Internationalisation (i18n) Character Encoding

Internationalisation (i18n) of java applications should not be difficult, although dealing with text in languages you don’t understand can be a little confusing! As a developer, you’ll normally be sent a translated version of the text to use in your application. If you’re really lucky the translator will be able to work directly with a java property file and you’ll get back a translated version to drop into your application.

A common scenario will be a series of java property files along the lines of;

  • – key/value pairs in base language, assumed to be English
  • – key/value pairs in language with country code XX
  • – key/value pairs in language with country code XX, variant YY

Take the file which contains an Arabic translation. Dropping that directly into your application will probably result in it being ignored, cue head scratching … The issue is that java property files must use character encoding ISO-8859-1 and to have been converted into Arabic the file is probably using character encoding UTF8 (or ISO-8859-6). Sun/Oracle solve this problem using native2ascii as follows (rename your original Arabic translation to;

native2ascii -encoding utf8

The resulting file isn’t as readble as the UTF8 version as all values have been converted to unicode – but, at least it now works!

If you want to keep your translations in UTF8 encoded files you need to be using Java 1.5 or greater along with XML based property files.