Sentence Segmentation Rules for Ancient Chinese Buddhist Text

Nicely written post on how to setup Chinese language segmentation rules for OmegaT translation software. It’s open source and runs quick and nimble.

People, Places, and Food

OmegaT-ClassicalChinese-segmentation

by Weedy Tan on January 14, 2014

After I came out with an OmegaT sentence segmentation rules for typical Chinese text, there was a request from someone in the Yahoo support group for addition of certain non-standard punctuation mark segmentation rules suitable for ancient Chinese Buddhist text.

Since this request is suitable only for this type of ancient Chinese Buddhist text (and possibly some ancient “Classical Chinese” text as well) and not for the present government (both the Chinese and Taiwanese governments came out with their own sets of punctuation marks though they are very much the same in practical usage. Also, traditional Chinese is in use in Taiwan while simplified Chinese is the one used in China) mandated punctuation marks, I suggested that these should not be included in the typical Chinese segmentation rules.

Instead, I volunteered to make a different set of segmentation rules for this purpose. Whether…

View original post 297 more words

Advertisements

One thought on “Sentence Segmentation Rules for Ancient Chinese Buddhist Text

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s