Deyi Xiong

Research Fellow
Human Language Technology
i2r
Contact Info
Research
Software
Publications
Curriculum Vitae[pdf]

Contact Information

Mailing Address   1 Fusionopolis Way #21-01 Connexis, Singapore 138632
Phone   (65) 6408 2771
Email  

Research

My research focuses on the natural language processing. Some topics of interest to me are:
  • Machine translation
  • Statistical parsing
  • Chinese language processing
  • Machine learning
Software

  • Tranyu: a Statistical Language Translation Tool (Sept. 2007 - Now)
    Tranyu is a combination name from English word Translate and Chinese word Yuyan (ΣοΡΤ, language). The homophony of Tranyu in Chinese is the word Chuanyu (΄«Σο), a very important role who is responsible for translating Sanskrit into Chinese in ancient China. Currently, Tranyu is enhanced with Linguistically Annotated BTG.
  • Mo-tse: a Dependency-based System for SMT (Sept. 2006 - Jun. 2007)
    This system is based on dependency grammars. The underlying model, called Dependency Treelet String Corresponding Model (DTSC), maps source dependency structures to target strings. We learn translation pairs of source treelets and target strings with their word alignments from the parsed and word-aligned corpus. We allow source treelets and target strings with variables so that the model can generalize to handle dependency structures with the same head word but with different modifiers and arguments. Additionally, target strings can be also discontinuous by using gaps which are corresponding to the uncovered nodes by source treelets. We propose a chart-style decoding algorithm with two basic operations--substituting and attaching--forthe model. We argue that the model is capable of lexicalization, generalization, and handling discontinuous phrases which are very desirable for machine translation.

    For more details about it, see A Dependency Treelet String Correspondence Model for Statistical Machine Translation.
  • Bruin: a Formal Syntax-based System for SMT (Aug. 2005 - Jun. 2007)
    This system is based on the Bracket Transduction Grammar (BTG, firstly proposed by Wu Dekai). Three kinds of rules are used. One is for lexical translation. The other two rules are used to merge neighboring blocks, in an inverted or straight order respectively. One disadvantage of BTG is that it doesn't provide an effective mechanism to determine the merging order, inverted or straight. My contribution is to consider this reordering as a classification problem and therefore introduce a maximum entropy based reordering model for it.

    For more details about it, see Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation.

    Bruin participated several international Chinese-to-English machine translation evaluation campaigns:

    • Won 4th place in NIST MT Eval 2006 (about 25 teams)

    • Won 3rd place in TC-STAR MT Eval 2007 (about 8 teams)

    • Won 2nd place in IWSLT MT Eval 2007 (about 15 teams)

  • MuskCpars: a Chinese Parser with Semantic Knowledge (Nov. 2003 - Jan. 2005)
    MuskCpars is a statistical parser developed for Chinese. The underlying model is a lexicalized, history-based model, where lexical dependencies between head words and modifier words are said to be very important. But actually, these dependency are not frequently used because our training corpus is small (around 3000 sentences, averagely 23 words per sentence). Therefore many head-modifier pairs of test set can not be found in training set. In MuskCpars, we built a selection preference sub-model to solve this problem. This model is conditioned on the dependency of the semantic category of modifier words on their head word. Experimental results show that this class-lexicon dependency is quite helpful for the improvement of parsing. The F1 on CTB release 1.0 is 80.52% with POS tagging accuracy 93.5%.

    For more details about it, see Parsing the Penn Chinese Treebank with Semantic Knowledge and Lexicalized Beam Thresholding Parsing with Prior and Boundary Estimates.

    Here is a demo: see MuskCpars Demo.
Publications


Original : @ Sun Mar 7 22:06 2004 @ Last modified : @ Tue Sep 16 17:30 2008 @