Nicely written post on how to setup Chinese language segmentation rules for OmegaT translation software. It’s open source and runs quick and nimble.
After I came out with an OmegaT sentence segmentation rules for typical Chinese text, there was a request from someone in the Yahoo support group for addition of certain non-standard punctuation mark segmentation rules suitable for ancient Chinese Buddhist text.
Since this request is suitable only for this type of ancient Chinese Buddhist text (and possibly some ancient “Classical Chinese” text as well) and not for the present government (both the Chinese and Taiwanese governments came out with their own sets of punctuation marks though they are very much the same in practical usage. Also, traditional Chinese is in use in Taiwan while simplified Chinese is the one used in China) mandated punctuation marks, I suggested that these should not be included in the typical Chinese segmentation rules.
Instead, I volunteered to make a different set of segmentation rules for this purpose. Whether…
View original post 297 more words