CZilla Translator

(HTML version of SOC application)

CZilla Translator is tool with user-friendly interface that manages translation of Firefox, Thunderbird and their extensions.

Althought CZilla is name of czech localization team, I am going to rename this project to "MozLoc" to not include "zilla".

Wording note: I mainly use word "translation", but I feel no difference between "translation" and "localization".

(updated, removed the webtool part)

(updated, added "Why XUL and not DHTML")

Motivation

Mozilla Firefox and Mozilla Thunderbird have almost 50 localizations [1]. Both of these products have powerful extension system, there are more than hundreds of extensions available. Localization of Mozilla based products is specific task that cannot be managed with common localization tools and scripts. There is only old and outdated project Mozilla Translator [2], which do not fully satisfy the needs of Mozilla translators, and do not help with the problems of translation of extensions at all.

There are almost 100 extensions translated to Czech [3]. Keeping all of these translations update with new versions is not trivial task, and this leads to many outdated translations. As a member of Czech localization team I feel strong need of some tool that will solve this problem. The only work that the translators should do is to translate new or changed strings – everything other should do the translator tool.

Concept

The CZilla Translator (CZT) will be based on XULRunner and manages translation projects, understands the file formats and provides UI for translation. There is already a draft of XULRunner based L10N tool in Mozilla wiki[4]. I want to start from it and cooperate with it’s authors (Zbigniew Braniecki is member of Poland localization team, Pavel Franc is from Czech localization team, he is the author of "CZilla Translator" name and idea). CZT is specific tool for Mozilla based products, but it should be modular, to enable to use it as a platform for creating translation tool for any other localizable-specific product.

Project Type

The "project type" is the format, from which are taken the strings to translate, and to which are the translated strings written. New project types should be easy to implement (for example windows RC based project). For the purpose of SOC, these project types must be implemented:

  1. Firefox/Thunderbird Windows/Unix binary with Jar files inside
  2. Directory based project for official localizers of Firefox/Thunderbird that uses CVS to store their files. CVS support will be integrated by external executable.
  3. Extension of Mozilla/Thunderbird
  4. XULRunner based application – to translate CZT itself :-)

File Types

CZT must be able to parse and write any file format used in Mozilla codebase, especially .properties files, DTD files (both with preprocessor directives), (X)HTML files and any miscellaneous files that are needed to create working localized product (like "install.rdf" in extensions). CZT should provide simple way how to implement new supported file type. The translated files should be written by default with strings in same order and including metadata like comments and localization notes.

Translation database

CZT will keep translated strings in local relational database to provide extensible and efficient search support. The translation database must be transparent to user in the manner of translated files – user must always be able to edit the translated files by hand and CZT should update user changes from translated files to its database. Thanks to this concept CZT will be possible to start translation of previously translated product. CZT will have no problem with writing and updating files controlled by CVS. Local translation database can be synchronized with server that can handle global translation database.

UI Specification

User must be able to view all project files (in tree), must be able to view and edit content of translated and original files. CZT should provide UI to see original and translated strings at once in one window. When translating individual string, CZT should display hint if there is already a translation of same or similar string or key. By using some external dictionary, CZT should also display hint with word by word translations. There should be "autotranslation" of strings, for which the CZT can predict the translation. For the purpose of predicting translation of string, CZT should implement some algorithm to measure string similarity. The user must have option to review the autotranslated strings.

Version Support

CZT must be able to compare old version and its translation with new version of product. This comparison sorts the strings into three groups: strings that do not changed, strings that are new, and strings, that changed. Changed strings also get penalty points that represent how much they changed. User can leave changed strings and not translate them again depending on some point value. There must be search support to see (or translate) only selected group of strings, or to sort string by penalty points.

Search and Print Support

All strings, keys and file names in database must be searchable. Search results must be printable, to provide easy way how to create printed version of translation (for manual checking).

Programming Languages and Development

CZT will be XULRunner application, so the UI will be in XUL, low level parts, like file parsers/writers should be implemented in C++ in XPCOM modules. The translation database will be stored in relational database like SQLite for local database and mysql for server database.

Why XUL and not DHTML?

At first, I answer another question: Why CZT should be client application and not server application? The main reason is that the user is translationg something, that is on his local computer, for example, Firefox extension or Firefox binary itself. The CZT parses strings from some files on disk, user translates these strings, and than CZT writes the translations back to some files. And than, the user must be able to run the translated product on his computer to test it. I thing this is clearly client-based process. The preferred platform for writing client based application is the XULRunner (at least in the Mozilla world). The XULRunner uses XUL, so I will use XUL, not DHTML. Group work and collaboration can be enabled by using CVS or other versioning system on the translated files.

Summary

The main goal is to create simply, easy to use translator. While having many powerful features like version support, string similarity algorithm and autotranslate, the user must be able to just start CZT and begin the translation.

Why I am the best individual to do this project?

I have good experience with Mozilla codebase. I am member of Czech localization team that can do the beta testing of CZT to ensure its quality and usability.

My experiences with Mozilla codebase:

9/2003
beginning of collaboration with CZilla project [5]
5/2004
technical paper "Customization of Mozilla Installer for Large Organization"[6], in Czech, winning regional SOC [7] (Czech high school student competition, similar to Intel ISEF [8])
9/2004
My first translated extension: Mouse Gestures [9]
3/2005
Centrum Turbo toolbar [10], Mozilla version of toolbar for Czech portal (but without official release)
10/2005
Work on review of Mozilla technologies for Czech developers on CZilla [11]
12/2006
As a seminar work in C++ I coded some implementation of nsIDictionary [12] XPCOM interface (binary search tree, AVL tree, 2-3 tree)

Links