1 Fusionopolis Way #21-01 Connexis, Singapore 138632
Phone
(65) 6408 2771
Email
Research
My research focuses on the natural language processing. Some topics of interest to me are:
Machine translation
Statistical parsing
Chinese language processing
Machine learning
Software
Tranyu: a Statistical Language Translation Tool (Sept. 2007 - Now)
Tranyu is a combination name from English word Translate and
Chinese word Yuyan (ΣοΡΤ, language). The homophony of Tranyu
in Chinese is the word Chuanyu (΄«Σο), a very important role who is responsible
for translating Sanskrit into Chinese in ancient China.
Currently, Tranyu is enhanced with Linguistically Annotated BTG.
Mo-tse: a Dependency-based System for SMT (Sept. 2006 - Jun. 2007)
This system is based on dependency grammars.
The underlying model, called Dependency Treelet String Corresponding Model (DTSC), maps source dependency structures to target strings.
We learn translation pairs of source treelets and target strings
with their word alignments from the parsed and word-aligned corpus.
We allow source treelets and target strings with variables so
that the model can generalize to handle dependency structures
with the same head word but with different modifiers and arguments.
Additionally, target strings can be also discontinuous by using gaps
which are corresponding to the uncovered nodes by source treelets.
We propose a chart-style decoding algorithm with two basic operations--substituting
and attaching--forthe model. We argue that the model
is capable of lexicalization, generalization, and handling
discontinuous phrases which are very desirable for machine translation.
Bruin: a Formal Syntax-based System for SMT (Aug. 2005 - Jun. 2007)
This system is based on the Bracket Transduction Grammar (BTG, firstly proposed by Wu Dekai). Three kinds of rules are used. One is for lexical translation. The other two rules are used to merge neighboring blocks,
in an inverted or straight order respectively. One disadvantage of BTG is that it doesn't provide an effective mechanism
to determine the merging order, inverted or straight. My contribution is to consider
this reordering as a classification problem and therefore introduce a maximum entropy based reordering model
for it.
Bruin participated several international Chinese-to-English machine translation evaluation campaigns:
Won 4th place in NIST MT Eval 2006 (about 25 teams)
Won 3rd place in TC-STAR MT Eval 2007 (about 8 teams)
Won 2nd place in IWSLT MT Eval 2007 (about 15 teams)
MuskCpars: a Chinese Parser with Semantic Knowledge (Nov. 2003 - Jan. 2005)
MuskCpars is a statistical parser developed for Chinese.
The underlying model is a lexicalized, history-based model, where lexical dependencies
between head words and modifier words are said to be very important.
But actually, these dependency are not frequently used because our
training corpus is small (around 3000 sentences, averagely 23 words per sentence).
Therefore many head-modifier pairs of test set can not be found in training set. In MuskCpars,
we built a selection preference sub-model to solve this problem.
This model is conditioned on the dependency of the semantic category of modifier words on their head word.
Experimental results show that this class-lexicon dependency is quite helpful for the improvement of parsing.
The F1 on CTB release 1.0 is 80.52% with POS tagging accuracy 93.5%.
Deyi Xiong, Shuanglong Li, Qun Liu, Shouxun Lin, Yueliang Qian.
Parsing the Penn Chinese Treebank with Semantic Knowledge.
Lecture Notes in Computer Science, Springer-Verlag, Volume 3651, Sep 2005, Pages 70 - 81.
In Proceedings of the Second International Joint Conference on Natural Language Processing (IJCNLP05),
Jeju Island, Korea.
Deyi Xiong, Qun Liu, Shouxun Lin.
Lexicalized Beam Thresholding Parsing with Prior and Boundary Estimates.
Lecture Notes in Computer Science, Springer-Verlag, Volume 3406, Jan 2005, Pages 132 - 141.
In the 6th Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Mexico City, Mexico, 2005.
Deyi Xiong, Qun Liu, Shouxun Lin. 2004. Statistical Chinese parsing with rich linguistic features.
in Chinese. Journal of Chinese Information Processing, Vol.19, Pages 61 - 66, March 2005. In the Second Student Workshop on Computational Linguistics, BLCU, Beijing, China. (Received the Excellent Paper Award).
2003
Huaping Zhang, Hongkui Yu, Deyi Xiong and Qun Liu. 2003. HHMM-based Chinese Lexical Analyzer ICTCLAS. In the
Second SIGHAN workshop affiliated with 41th ACL, Sapporo Japan.