Perl Script to convert Chinese Simplified to Traditional or Traditional to Simplified


Just finished watching a movie with Chinese subtitles. I’m more interested in practicing reading of traditional characters, so I found a Perl script that converts a file containing simplified Chinese characters and converts them to traditional. Works like a charm. Works from command line on my Darwin Linux. You’re probably thinking this can be done with a copy paste option, but command line is much more convenient, since it won’t affect the text layout, which sometimes happens with copy and paste commands. This command worked on a .srt subtitle file used for films. Give it a try if you want. Found this script on Sourceforge. Thanks VBenito!


[sed] Replace a tab character using sed

Doing some text manipulation with my glossaries lately. This little trick comes in quite handy for those who like to use Linux Terminals. The glossaries that I’ve created were done with OmegaT.

*NIX Tricks

Here’s how to replace all instances of TAB in a file input_file by, say comma (,), using sed

$ sed 's/<TAB>/,/g' input_file

But what is <TAB> above? On Linux systems you may just type t (which is the regular expression for TAB) in place of <TAB>. However,  on some other systems (e.g., OSX with FreeBSD) it does not work. In cases where it doesn’t work, invoke <TAB> by hitting Control+v followed by the TAB key. This may alternatively be achieved by hitting Control+v followd by Control+i, as well.

References:ATOzTOA, Stack Overflow.

View original post

Clickable links in OmegaT# notes and comments

Github has so many interesting projects.

Translator's Recipes

Here’s a GitHub project for an OmegaT plugin that converts URL’s in notes and comments into clickable items that open the URL’s in the default browser. Pretty neat, especially when you’re working in a team project and need to insert references for the editor or another translator.

Clickable links example

In order to install the plugin one needs to create a folder named LinkBuilder (or whatever sounds good and preferably makes sense) inside plugins subfolder either in the OmegaT installation folder, or in OmegaT settings folder, download the latest release, and unzip it into the newly created LinkBuilder folder. The plugin will be activated upon OmegaT restart (or in a new OmegaT instance).

I don’t know who the author of the plugin is (other than his username at GitHub is hiohiohio), but kudos anyway!!!

View original post

Using Dictionary Unifier on Mac OS X

Mac OS X Dictionary

After searching and wading through various websites and using alternate search engines, I managed to find a way to add extra dictionaries to Mac OS’s default dictionary. The default languages are English & Japanese. Other languages can, however, be added. Finding these dictionaries is a bit of a hassle, but that depends on the language pair(s) that you work with. This blog post explains how to add Chinese dictionaries.

To do this, you can useDictUnifier, a great little tool that helps setup your dictionaries to Mac OS. It can be downloaded here:


Download it and install it to any location, or simply put it in your Apps directory





Once the DictUnifier program is setup in your Applications directory, open it and drag the .ifo file of the dictionary you wish to add to the Mac OS Dictionary. drag_ifo_file

The version stated earlier works with Mac OS X Lion, but there are older versions that work on older versions of Mac OS, namely 10.6.


The software then prompts you to give a name to your dictionary. I used shortened names, since I have added quite a few specialized dictionaries (Medical, Chinese Idioms, Fojiao et cetera …


Click start and the .ifo dictionary chosen is added to the default Dictionary App. It can take some time, depending on the size of the dictionary, but be patient. Again, this is a fine little application. I haven’t tested it yet with the French to English dictionaries, but will do so as soon as I can find the more decent and modern dictionaries. Hoping to find a Robert-Collins Fr-En dictionary soon.

The next step should be as follows.

3.2 Blog_Post_DictUnifierAgain, this takes time, but the program is running Perl and other scripts in the background.

The next step is to setup your dictionary in the preferences pane

Test it out!

3.4 blog_post_adding_dictionaries_DictUnifier

3.3 BlogPost1
3.4 blog_post_adding_dictionaries_DictUnifier

Chinese Dictionaries for OmegaT

Thanks for sharing this Weedy! Our discussions along with James have been really constructive for me. Thx guys! Learning what school doesn’t seem up-to-date on teaching. OmT for freelance translator! Cheers

People, Places, and Food

by Weedy Tan on January 25, 2014

When I first started using OmegaT, I couldn’t figure out how to find and install a suitable Chinese dictionary. I didn’t care so much as I can use online Chinese <> English dictionaries while still learning OmegaT. However, after reading some old posts and discussions in the OmegaT Yahoo support group, I decided to research on this and find a way to install the Chinese dictionaries.

There are, in fact, many resources when it comes to available Chinese <> English dictionaries in both Traditional and Simplified Chinese. However, as a novice OmegaT user, I couldn’t understand the differences amongst those numerous dictionaries. Based on the OmegaT manual, I need to find a packed or zipped file with *.tar.bz2 file extension name. When unzipped, it should have 3 files with file extension names as follows:

    1. *
    2. *.idx
    3. *.ifo

And all the above should have…

View original post 353 more words

Sentence Segmentation Rules for Ancient Chinese Buddhist Text

Nicely written post on how to setup Chinese language segmentation rules for OmegaT translation software. It’s open source and runs quick and nimble.

People, Places, and Food


by Weedy Tan on January 14, 2014

After I came out with an OmegaT sentence segmentation rules for typical Chinese text, there was a request from someone in the Yahoo support group for addition of certain non-standard punctuation mark segmentation rules suitable for ancient Chinese Buddhist text.

Since this request is suitable only for this type of ancient Chinese Buddhist text (and possibly some ancient “Classical Chinese” text as well) and not for the present government (both the Chinese and Taiwanese governments came out with their own sets of punctuation marks though they are very much the same in practical usage. Also, traditional Chinese is in use in Taiwan while simplified Chinese is the one used in China) mandated punctuation marks, I suggested that these should not be included in the typical Chinese segmentation rules.

Instead, I volunteered to make a different set of segmentation rules for this purpose. Whether…

View original post 297 more words

OmegaT Software

I’ve recently started using OmegaT software. Unfortunately, most Universities are not so keen on teaching us about these kinds of translation tools, but rather stick to the more mainstream commercial CAT (Computer Assisted Translation) tools. I’ve used and picked up on how to use tools like Trados Studio and the whole slew of products they promote, but OmegaT works just as well +PLUS+ it can be run off the web with Java JRE. A real advantage if you plan on leaving your office but have the TM’s stored in your USB.

The link below showcases some interesting scripts written for OmegaT by Kos Ivantsov. Thanks for sharing Kos!