Jian SU (苏俭)

Senior Scientist
Group Leader, Information Extraction and Text Mining
Member, Core Technology Review Panel
Institute for Infocomm Research, Singapore

Member, Technology Advisor Committee of National Centre for Text Mining, UK (2007 - Present);

E-mail: sujian (at) i2r.a-star.edu.sg
Websites: http://www1.i2r.a-star.edu.sg/~sujian, http://nlp.i2r.a-star.edu.sg/
Fax: +65 6776 1378
Tel: +65 6408 2143

Postal Address:
Institute for Infocomm Research
1 Fusionopolis Way, #21 - 01 Connexis
Singapore 138632

Biography

Dr Jian Su received her B.Sc degree in Electronics from Sichuan University, China in 1990, M.Sc and PhD degrees in electrical & electronic engineering from South China University of Technology in 1993 and 1996 respectively. She was a Research Assistant from 1994 to 1995 at City University of Hong Kong, and an intern student at Centre de Recherche en Informatique de Nancy, France in 1995. She joined Institute for Infocomm Research (I2R), formerly known as Institute of System Sciences, where she established herself in the areas of Information Extraction, Coreference Resolution, (Bio)Text Mining. Dr Su has published intensively in natural language processing (NLP) and bioinformatics conferences and journals, including 13 papers in ACL Annual Meetings, and one journal article in Computational Linguistics in recent years. Dr Su is active in professional services for the computational linguistics community. She has served as Editor / Member of Editorial Board for three international journals. She is program chair of ACL-IJCNLP09, publication chair of ACL 2007 and IJCNLP 2005, program chair of LBM 2007,Workshop Organizer of LREC2008 Workshop: Building and evaluating resources for biomedical text mining, and PC members of numerous NLP conferences including ACL, IJCNLP, COLING, and EMNLP.

Dr Su also led her team to achieve top performances in various information extraction / text mining benchmarking such as BioCreAtIve. She was responsible for the effort to establish the largest co-reference annotation corpus that has 2000 Medline abstracts and 43 biomedical full papers from GENIA collection. She has been the Principal Investigator in multiple technology deployments including BioMedical Information Management, Intelligence Gathering, Legal / Standard Enforcement.

Research Interests

Information Extraction; Discourse Analysis; Text Mining; Language Resources & Evaluation; Machine Translation; Segmentation, Tagging and Chunking; Machine Learning for Natural Language

Recognitions/Awards

2007 Best performance in BioCreAtIve II (Critical Assessment for Information Extraction in Biology) in protein protein Interaction Article Selection sub-task in terms of F Score among 19 international teams including Edinburgh Uni., Uni. of California, Berkeley, Uni. of Colorado.

2004 Best performance in the closed test of Bio-Named Recognition task with BioCreAtive I (Critical Assessment of Information Extraction systems in Biology), among 12 international teams including the 1st runner up, the joint team of Stanford Uni. and Edinburgh Uni.

2004 Best Performance in Bio-NER task with IJNLPBA, CoLing 2004, among 8 international teams including the 1st runner up, the joint team of Stanford Uni. and Edinburgh Univ.

2003 The Enterprise Challenge Award, Singapore

2002 Best Paper Award, Laboratory of Information Technology, Singapore

2000 Best individual system with English text chunking task at Computational Natural Language Learning (CoNLL-2000);

Professional Services

Editor / Editorial Board Member

            1.        ACM Transactions on Intelligent Systems and Technology (ACM TIST) (2010 -2012);

            2.         International Journal of Computer Processing of Oriental Languages (2007- 2010);

            3.         Journal of Computational Linguistics and Chinese Language Processing (2004-2008);

Invited talks:

            1.         Invited talk at School of Computer Engineering, Nanyang Technological University, Singapore, March 19, 2009

            2.         Keynote Speech at International Conference on Asian Language Processing 2008, Chiang Mai, Thailand, Nov 12-14, 2008

            3.         Invited talk at The National Centre of Text Mining, UK, April 23, 2008

            4.         Invited talk at 5th joint workshop of BioInformatics and Natural Language Processing, Korea, Feb 14-15, 2008

            5.         Invited talk at Industrial Program of European BioInfomatics Institute, UK, Sep 14, 2007

            6.         Invited talk at MOE & Microsoft Key Laboratory of Natural Language Processing and Speech, HIT, China, Oct 15, 2005

            7.         Invited talk at The National Centre of Text Mining, UK, April 14-15, 2005

            8.         Keynote Speech at 3rd joint workshop of BioInformatics and Natural Language Processing, Korea, Feb 20-22, 2005

            9.         Invited speech at CJNLP-04 (4th China-Japan Joint Conference to Promote Cooperation in Natural Language Processing), Nov. 10-12, 2004

Member of Conference Organizing Committees

            1.        Program Chair, ACL/IJCNLP09;

            2.         Workshop Organizer, LREC2008 Workshop: Building and evaluating resources for biomedical text mining;

            3.         Program Committee Chair, International Symposium on Languages in Biology and Medicine (LBM 2007);

            4.         Publication Chair, 45th Annual Meeting of the Association for Computational Linguistics (ACL2007);

            5.         Publication Chair, International Joint Conference on Natural Language Processing (IJCNLP05);

            6.         Workshop chair, International Workshop on Language, Semantics, and Web 2003;

Member of Program Committees

            1.        ACL 2010;

            2.         EMNLP 2010, for "Information Extraction" and "Text mining and NLP applications" areas;

            3.         CoLing 2010, for "Discourse and Pragmatics", "Information Extraction" and "Language Resource" areas;

            4.         23rd International Conference on the Computer Processing of Oriental Languages (ICCPOL 2010);

            5.         The Fourth Symposium on Semantic Mining in Biomedicine (SMBM 2010);

6.                  2nd Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2010), LREC 2010;

7.                  New Challenges for NLP Frameworks, a workshop at LREC 2010;

            8.         The 1st International Workshop on Web Information Processing (IWWIP 2010) ;

            9.         The 23rd Pacific Asia Conference on Language, Information and Computation (PACLIC 23);

            10.      3rd International Symposium on Languages in Biology and Medicine (LBM 2009);

            11.      The 22nd International Conference on the Computer Processing of Oriental Languages (ICCPOL'09);

            12.      ACL 2008,  Reviewer for Information Extraction, Co-reference and Topic Models Area;

            13.      CoLing2008, Reviewer for Information Retrieval, Text and Web Ming, Multimedia Area;

            14.      The 3nd International Symposium on Semantic Mining in Biomedicine (SMBM), 2008;

            15.      ACL 2008, BioNLP workshop;

            16.      The Fourth Asian Information Retrieval Symposium (AIRS 2008) for Natural Language Processing and IR Area;

            17.      The 3rd International Joint Conference on Natural Language Processing(IJCNLP-08), for Text Mining and Information Extraction Area;

            18.      ACL2007, for Tagging and Word Segmentation Area;

            19.      The 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), for Information Extraction Area;

            20.      15th Annual International Conference on Intelligent Systems for Molecular Biology & 6th Annual European Conference on Computational Biology (ISMB  / ECCB 2007), for Text Ming Area;

            21.      IJCAI 2007;

            22.      HLT / NAACL2007, for Information Extraction: words, concepts and inference Area;

            23.      International Conference on Computational Linguistics and the Association for Computational Linguistics  (CoLing-ACL06), for Lexical Semantics Area;

            24.      EMNLP 2006, for Discourse Area  and Term & Entity Extraction Area;

            25.      The 29th Annual International ACM SIGIR Conference (SIGIR’06);

            26.      11th conference of the European Chapter of Association for Computational Linguistics (EACL 2006), for tagging, and word segmentation Area;

            27.      21st International Conference on the Computer Processing of Oriental Languages (ICCPOL'06);

            28.      Reviewer, The 14th Annual International conference on Intelligent Systems for Molecular Biology (ISMB 2006);

            29.      The 2nd International Symposium on Semantic Mining in Biomedicine (SMBM), 2006;

            30.      Third International Conference on Terminology, Standardization and Technology Transfer, 2006;

            31.      The 19th Pacific Asia Conference on Language, Information and Computation, Dec 1-3, 2005;

            32.      International Symposium on Languages in Biology and Medicine (LBM2005);

            33.      ACL05: for "Machine Learning for NLP” area,  "Discourse, Dialogue and Multimodality" area, and Interactive Poster and Demo;

            34.      Advanced Technologies for Oriental Information Processing, 2005;

            35.      International Conference on Chinese Computing(ICCC 2005);

            36.      EMNLP 2004;

            37.      International Joint workshop on Natural Language Processing in Biomedicine and its Applications(IJNLPBA), CoLing'04

            38.      IJCNLP04: for "Information Extraction &Q/A" area, "Word Segmentation" area, "Lexical Semantics" area, "Text Mining in Biomedicine" Thematic Session,  Asian Language Resources workshop;

Reviewer of Journals

1.                   Computational Linguistics;

2.                   Journal of Artificial Intelligence Research (JAIR);

3.                   Bioinformatics;

4.                   PLoS Computational Biology;

5.                   BMC BioInformatics;

6.                   Information Processing and Management;

7.                   International Journal of Computational Linguistics & Chinese Language Processing;

8.                   IEEE Information System;

9.                   Computers and Humanities;

10.               Journal of Bioinformatics and Computational Biology;

11.               ACM Transaction on Asian Language Information Processing;

12.               International Journal on Applied Intelligence;

13.               International Journal of Information Technology;

14.               Journal of Chinese Language and Computing;

15.               Semantic Web: Revolutionizing Knowledge Discovery in the LifeSciences (book);

Others:

            1.        Invitee, Dagstuhl seminar on "Text Mining and Ontologies for Life Sciences", Germany, 2008

            2.         Senior Member, JHU Summer Workshop on Entity Disambiguation 2007, USA

            3.         Professor(Adjunct), Harbin Institute of Technology, China (2005 - 2008);

            4.         Associate Professor(Adjunct), School of Information System, Singapore Management University (2005 - 2007);

 

Publications _ Referred

Leading journals

            1.     Man Lan, Jian Su, Empirical investigations into full-text protein interaction article categorization task (ACT) in the BioCreative II.5 Challenge,  accepted by EEE/ACM Transactions on Computational Biology and Bioinformatics

            2.         Man Lan, Chew Lim Tan, Jian Su, Feature generation and representations for protein-protein interaction classification,  Journal of Biomedical Informatics, Volume 42, Issue 5 (Oct 2009), P866-872, ISSN: 1532-0464;

            3.         Man Lan, Chew Lim Tan, Jian Su, Yue Lu, Supervised and Traditional Term Weighting Methods for Automatic Text Categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence,VOL. 31, NO. 4, April. 2009;

            4.         Yang Xiao Feng, Su Jian, Tan Chew Lim, A Twin-Candidate Model for Learning Based Coreference Resolution, Computational Linguistics 34: 2, 2008;

            5.         Haizhou Li, Jin_Shea Kuo, Jian Su, Chih-Lung Lin, Mining Live Transliterations Using Incremental Learning Algorithms, International Journal of Computer Processing of Lauguage, Vol. 21, No. 2(2008) 183-203;

            6.         Zhou GuoDong, Zhang Jie, Su Jian, Shen Dan and Tan Chew Lim. Recognizing Names in Biomedical Texts: a Machine Learning Approach. BioInformatics, 20(7):1178-1190. DOI:10.1093/bioinformatics /bth060. 2004. ISSN: 1460-2059 [SCI], 122 citations

            7.         Zhang Jie, Shen Dan, Zhou GuoDong, Su Jian and Tan Chew Lim. Enhancing HMM-based Biomedical Named Entity Recognition by Studying Special Phenomena. Journal of Biomedical Informatics, Special Issue on Natural Language Processing in Biomedicine: Aims, Achievements and Challenge.  37(6). 411-422. 2004. ISSN: 1532-0464 [SCI Expanded], 19 citations

            8.         Zhou GuoDong and Su Jian. Machine Learning-based Named Entity Recognition via Effective Integration of Various Evidences.  Natural Language Engineering (journal), 11(1): 2005, Cambridge Press. ISSN 1469-8110

            9.         Zhou GuoDong, Shen Dan, Zhang Jie, Su Jian, Tan Soon Heng and Tan Chew Lim. Recognition of protein and gene names from text using an ensemble of classifiers and effective abbreviation resolution. BMC Bioinformatics. ISSN: 1471-2105 [SCI Expanded], 40 citations

            10.      Su Jian, K.T. Ng, H, Li, J-P Haton, "Nonparametric distance measures of speaker verification", IEE Electronics Letters, 27th Apr. 1995, Vol.31, No. 9

            11.      H. Li, Su Jian, J-P Haton, "Short-timed speech dynamics for speaker recognition", IEE Electronics Letters, 17th Aug. 1995, Vol. 31, No. 17

 

Leading conferences

 

1.           WenTing WANG, Jian SU and Chew Lim TAN, Kernel Based Discourse Relation Recognition With Temporal Ordering Information, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 710–719, Uppsala, Sweden, 11-16 July 2010.

2.           Bin Chen, Jian Su and Chew Lim Tan, Resolving Event Noun Phrases to Their Verbal Mentions, accepted by EMNLP 2010

3.           Bin Chen, Jian Su and Chew Lim Tan. A Twin-Candidate Based Approach for Event Pronoun Resolution using Composite Kernel, accepted by Coling 2010 (Oral).

4.           Wei Zhang, Jian Su and Chew Lim Tan. Entity Linking Leveraging Automatically Generated Annotation, accepted by CoLing 2010 (Oral).

5.           Zhi Min Zhou, Man Lan, Yu Xu, Zheng Yu Niu, Jian Su, Chew Lim Tan, Predicting Discourse Connectives for Implicit Discourse Relation Recognition, accepted by CoLing 2010.(poster)

6.            Xiaofeng Yang, Jian Su, Jun Lang, Chew Lim Tan, Ting Liu, Sheng Li, An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming, Proceedings of ACL-08: HLT, P843-851, Columbus, Ohio, USA, June 2008. 17 citations

7.           Chen Bin, Xiaofeng Yang, Jian Su, Chew Lim Tan, Other-Anaphora Resolution in Biomedical Texts with Automatic Mined Patterns, Proceedings of the 22nd International Conference on Computational Linguistics (CoLing 2008), pages 121-128, Manchester, August 2008.

8.           Stanley Yong Wai Keong, Su Jian, An Effective Method of Using Web Based Information for Relation Extraction, Proceedings of 3rd International Joint Conference of Natural Language Processing (IJCNLP2008), P350-357, Hyderabab, India.

9.           Xiaofeng Yang, Jian Su, Coreference Resolution Using Semantic Relatedness Information from Automatically Discovered Patterns, Proceedings of the 45th Annual meeting of the Association for Computational Linguistics, page.528-535, Prague,Czech Republic June 2007. 23 citations

10.      Man LAN, Chew Lim TAN, Jian SU, Hwee Boon LOW, Text Representations for Text Categorization: A Case Study in BioMedical Domain, pp 2557-2562, proceedings of International Joint Conference on Neural Networks 2007 (IJCNN 2007), Orlando, FL, USA.

11.      X. Yang, J. Su & C.L. Tan. 2006. Kernel-based pronoun resolution with structured syntactic knowledge, In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 06), Pages 41- 48. Sydney, Australia. 38 citations

12.      GuoDong Zhou, Jian Su and Min Zhang. Modeling Commonality among Related Classes in Relation Extraction. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006), Pages 121-128, Sydney, Australia.

13.      Min Zhang, Jie Zhang, Jian Su, Guodong Zhou. A Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006), Pages 825-832. Sydney, Australia. 73 citations

14.      Min Zhang, Jie Zhang and Jian Su. Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel. In Proceedings of the Human Language Technology conference - North American chapter of the Association for Computational Linguistics annual meeting (HLT-NAACL 2006), New York, USA. 27 citations

15.      Aw AiTi, Zhang Min, Xiao Juan, Su Jian, A phrase based statistical model for SMS texts normalization (poster), Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006), Pages 33-40. Sydney, Australia.

16.      Zhou GuoDong, Su Jian, Zhang Jie, Zhang Min, Exploring Various Knowledge in Relation Extraction, Proceedings of the 43rd Annual meeting of the Association for Computational Linguistics (ACL05), Ann Arbor, Michigan, US, pp.427-434. 104 citations

17.      Yang Xiao Feng, Su Jian, Tan Chew Lim, Improving Pronoun Resolution Using Statistics - Based Semantic Compatibility Information, Proceedings of the 43rd Annual meeting of the Association for Computational Linguistics (ACL05), Ann Arbor, Michigan, US, pp.165-172. 35 citations

18.      Min ZHANG, Jian SU, Danmei WANG, Guodong ZHOU, Chew Lim Tan,  Discovering Relations from a Large Raw Corpus Using Tree Similarity - based Clustering, Natural Language Processing - IJCNLP 2005, Second International Joint Conference, Jeju, Korea, October, 2005, proceedings, pp 378-389, LNAI 3615 Springer. 25 citations

19.      Min ZHANG, Haizhou LI, Jian SU and Hendra SETIWAN, A Phrase-based Context-dependent Joint Probability Model for Named Entity Translation, Natural Language Processing - IJCNLP 2005, Second International Joint Conference, Jeju, Korea, October, 2005, proceedings, pp 600-611, LNAI 3615 Springer.

20.      Xiaofeng Yang, Jian Su and Chew Lim Tan, A Twin-Candidate Model of Coreference Resolution with Non-Anaphor Identification Capability, Natural Language Processing - IJCNLP 2005, Second International Joint Conference, Jeju, Korea, October, 2005, proceedings, pp 719-730, LNAI 3615 Springer. Was selected as one of the best papers for publication on ACM Transaction on Asian Language Information Processing.

21.      XiaoFeng Yang, Jian Su, GuoDong Zhou, ChewLim Tan. Improving Pronoun Resolution by Incorporating Coreferential Information of candidates. P128-135, ACL 2004, July 21-26, Bacelona, Spain.34 citations

22.      Dan Shen, Jie Zhang, Jian Su, Guodong Zhou, Chew-Lim Tan. Multi-Criteria-based Active Learning for Named Entity Recognition. P590-597, ACL 2004, July 21-26, Bacelona, Spain. 71 citations

23.      Li Haizhou, Zhang Min, Su Jian. A Joint Source-Channel Model for Machine Transliteration. P160-167, ACL 2004, July 21-26, Bacelona, Spain. 103 citations         

24.      Xiaofeng Yang, Jian Su, Guodong Zhou and Chew Lim Tan. A NP-Cluster Based Approach to Coreference Resolution. P226-232, Proceedings of 20th International Conference on Computational Linguistics (COLING'2004). Aug 23-27, 2004, Geneva, Switzerland. 28 citations

25.      Zhou GuoDong and Su Jian. A high-performance coreference resolution system using a multi-agent strategy. P522-528, Proceedings of 20th International Conference on Computational Linguistics (COLING'2004). Aug 23-27, 2004, Geneva, Switzerland.

26.      Zhang Min, Li Haizhou and Su Jian. Direct Orthographical Mapping for Machine Transliteration. P716-722, Proceedings of 20th International Conference on Computational Linguistics (COLING'2004). Aug 23-27, 2004, Geneva, Switzerland.

27.      XiaoFeng Yang, GuoDong Zhou, Jian Su and Chew-Lim Tan. Improving Noun Phrase Coreference Resolution by Matching Strings. Proceedings of 1st International Joint Conference on Natural Language Processing (IJCNLP'2004), March 22-24, 2004, Sanya, China, pp226-333. 21 citations

28.      Yang XiaoFeng, Zhou GuoDong, Su Jian and Tan ChewLim. Coreference Resolution Using Competition Learning Approach. Proceedings of ACL 2003, Sapporo, Japan, 7-12 July 2003, pp176-183. 98 citations

29.      Zhou GuoDong and Su Jian. Named Entity Recognition Using a HMM-based Chunk Tagger. Proceedings of ACL 2002, Philadelphia, US, July 2002. 192 citations

30.      Zhou GuoDong and SU Jian. Error-driven HMM-based Chunk Tagger with Context-dependent Lexicon. Proceedings of EMNLP/VLC-2000, Hong Kong, Oct 7-8 2000. 36 citations

31.      Zhou GuoDong, Su Jian and Tey TongGuan. Hybrid Text Chunking. Proceedings of CoNLL'2000, Lisbon, Portugal, Sept 11-14 2000. 35 citations

32.      Su Jian, H. Li, K.T. Ng, Speaker time-drifting adaptation using trajectory mixture hidden Markov Models, IEEE ICASSP'96, Atlanta, USA

 

Others

 

            1.     Jian Su and Christopher Baker (Eds.), BMC Bioinformatics Special Issue on Proceedings of the Second International Symposium on Languages in Biology and Medicine(LBM) 2007, 9(Suppl 3):S1, 11 April 2008.

            2.         Robert Dale, Kam-Fai Wong, Jian Su, and Oi Yee Kwong (Eds.), Natural Language Processing - IJCNLP 2005, Second International Joint Conference, Jeju Island, Korea, October 11-13, 2005. Proceedings, Lecture Notes in Artificial Intelligence (LNAI 3651, Springer)

            3.         Man LAN, Chew Lim TAN, Jian SU, A Term Investigation and Majority Voting for Protein Interaction Article Sub-task 1 (IAS), Proceedings of the Second BioCreAtIve Challenge Evaluation Workshop, P183-185, Spain.

            4.         Nie Yu, Yang Lingpeng, Zhang Jie, Su Jian, Ji Dong Hong, I2R at TREC 2006 Genomics Track, The Fifteenth Text REtrieval Conference (TREC 2006) Proceedings.

            5.         Nie Yu, Yang Lingpeng, Ji Donghong, Zhang Jie, Su Jian, Yang Xiaofeng, Soon-Heng Tan, Xiao Juan, Zhou Guodong, TREC 2005 Genomics Track at I2R, The Fourteenth Text REtrieval Conference (TREC 2005) Proceedings.

            6.         Juan Xiao, Jian Su, GuoDong Zhou, Chewlim Tan, Protein-Protein Interaction Extraction: A supervised Learning Approach, Proceedings of Symposium on Semantic Mining in Biomedicine, http://CEUR-WS.org/Vol-148/ , 2005, 33 citations

            7.         Zhou GuoDong, Su Jian, Resolution of Data Sparseness in Named Entity Recognition using Hierarchical Features and Feature Relaxation Principle, Proceedings of 6th International Conference on Intelligent Text Processing and Computational Linguistics, Lecture Notes in Computer Science (LNCS 3406, Springer). pp. 745-756, 13-19 Feb 2005, Mexico City.

            8.         Xiaofeng Yang, Jian Su. Entity-Based Noun Phrase Coreference Resolution, Proceedings of 6th International Conference on Intelligent Text Processing and Computational Linguistics, Lecture Notes in Computer Science (LNCS 3406, Springer), PP 213- 216, 13-19 Feb 2005. Mexico City.

            9.         Zhou GuoDong and Su Jian. Exploring deep knowledge resources in biomedical name recognition. Proceedings of 2004 Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA'2004 shared task), P99-102, Aug 28-29, 2004, Geneva, Switzerland. 84 citations

            10.      Zhou GuoDong, Shen Dan, Zhang Jie, Su Jian, Tan Soon Heng and Tan Chew Lim. Recognition of protein and gene names from text using an ensemble of classifiers and effective abbreviation resolution. Proceedings of BioCreAtIvE Workshop, pp 26-30, March 28-31, 2004, Granada, Spain.

            11.      Dan Shen, Jie Zhang, Jian Su, GuoDong Zhou and Chew-Lim Tan. A Collaborative Ability Measurement for Co-training. Proceedings of 1st International Joint Conference on Natural Language Processing (IJCNLP'2004), pp608-613, March 22-24, 2004, Sanya, China.

            12.      GuoDong Zhou and Jian Su. Integrating Various Features in Hidden Markov Model using Constraint Relaxation Algorithm for Recognition of Named Entities without Gazetteers. Proceedings of IEEE International Conference on NLP & KE, pp732-739,  Oct 26-29, 2003, Beijing, China.

            13.      Dan Shen, Jie Zhang, Guodong Zhou, Jian Su, Chew-Lim Tan. Effective Adaptation of Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain. Proceedings of the ACL03 Workshop on Natural Language Processing in Biomedicine, pp49-56, Sapporo, Japan, July 2003. 55 citations

            14.      Zhou GuoDong and Su Jian. A Chinese Efficient Analyzer Integrating Word Segmentation, Part-Of-Speech Tagging, Partial Parsing and Full Parsing. Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, pp78-83, ACL'03, Sapporo, Japan, 11-12 July 2003.

            15.      Jian Su, Jin Guo, Loong Cheong Tong, "Target Word Selection with Co-occurrence and Translation Information", Machine Translation Summit VII'99, 1999.

            16.      Su Jian, K.T. Ng, Xu Bing Zheng, "Speaker Recognition with discriminative speaker VQ models", ESCA, Eurospeech'95, Madrid, Spain

            17.      H. Li, J-P Haton, Jian Su and Y. Gong, "Speaker recognition with temporal transition models", ESCA, Eurospeech'95, Madrid, Spain,

            18.      Su Jian, Da Beng Liu, H. Li, "Study of discriminative neural networks in speech recognition", IEEE ICNNSP'95, Nanjing, China

 

Recent External Collaboration Projects/Grants (These are beyond the core research and development work in Institute for Infocomm Research)

    1.     Principle Investigator, High Performance Entity Tracking System, 2007-2009;

To develop a high performance Named Entity Recognition and Co-reference Resolution engine for text mining systems on intelligence gathering.

    2.     Principal Investigator (I2R, Singapore), Supervisory Committee Member, Project Management Committee Member, Work Package Leader, EU strategic research project under IST 4 call: BootStrep. 2006 - 2009

The project consortium consists of 6 EU parties and I2R team. My team will provide co-reference resolution besides other nlp tools being part of human language technology infrastructure for bio-knowledge extraction and bio-ontology acquisition. I will further lead the final work package to validate bio-lexicon and bio-ontology generated in the first two years with information extraction, information retrieval and multilingual access tasks. This work package involves 5 teams including UK national text mining centre, European BioInformatics Institute, Jena Univ.(Germany), Freiburg Univ. (Germany), Universit de Rennes (France) and I2R.

    3.      Principal Investigator, I2R-Tokyo University joint project "BioCo Full Paper Annotation", 2006 - 2007

This project is to annotate co-reference information on biology full papers. The annotated corpus, BioCo, currently with 24 full biomedical papers annotated, the largest full paper annotation, is an important resource for co-reference resolution on bio-literature domain for information extraction and text-mining purposes.

     4.      Principal Investigator, I2R-Tokyo University joint project "MedCo Corpus Annotation", 2004 - 2006

This project is to annotate co-reference information on MedLine abstracts with GENIA corpus, the most popular resources for natural language processing in biology literature constructed by Tsujii_s lab. The annotated corpus, MedCo with 1999 abstracts, the largest coreference annotation is an important resource for co-reference resolution on bio-literature domain for information extraction and text-mining purposes.

    5.      Co-Principal Investigator, "A Personalized and Adaptive Literature Curation System for the Biomedical Science", 2005 - 2008

To build a grid based literature curation system together with National Cancer Centre, Genome Institute of Singapore, National University of Singapore, Bioinformatics Institute, National Grid Office, Singapore.

    6.      Principal Investigator, I2R-SOC NUS joint project "Information Extraction on Biology Literature", July 2003 - June 2007

This project is to develop information extraction technologies and to build information management applications for biology literature. 9 postgraduate / PhD students and 2 Research fellows has been developed in this project, besides other achievements including a number of publications, benchmark competitions and etc.

    7.      Technical Advisor, Material Safety Document Sheet Knowledge Workbench project, 2003 - 2004

The project is to build a system extracting material safety information from document sheets and further checking the validation according to international standards. The system is delivered to an government organization. Instead of only being able to randomly check 5 % of large amount of data sheets, the government officer could check 100% of data sheets with the help of the system. It won The Enterprise Challenge Award 2003.

    8.       Project Manager, TextMining CoT funding project, 2004;

The project is to build SDK tools with the engines built in house to make them ready for commercialization. The tools cover information extraction, text clustering, text classification, text retrieval, summarization and term extraction technologies. The project has led to a number of licensing to SMEs and 5 Polytechnics and good publicity for enable the company _ready to _run_ and go to market fast with a new product.

·