Jian SU (苏俭)
Senior Scientist
Group Leader, Information Extraction and Text Mining
Member, Core Technology Review Panel
Institute for Infocomm Research, Singapore
Member, Technology Advisor Committee of National Centre for Text Mining, UK
(2007 - Present);
E-mail: sujian (at) i2r.a-star.edu.sg
Websites: http://www1.i2r.a-star.edu.sg/~sujian, http://nlp.i2r.a-star.edu.sg/
Fax: +65 6776 1378
Tel: +65 6408 2143
Postal Address:
Institute for Infocomm Research
1 Fusionopolis Way, #21 - 01 Connexis
Singapore 138632
Biography
Dr Jian Su received her B.Sc degree in Electronics from Sichuan University,
China in 1990, M.Sc and PhD degrees in electrical & electronic engineering
from South China University of Technology in 1993 and 1996 respectively. She
was a Research Assistant from 1994 to 1995 at City University of Hong Kong, and
an intern student at Centre de Recherche en Informatique de Nancy, France in
1995. She joined Institute for Infocomm Research (I2R), formerly known as
Institute of System Sciences, where she established herself in the areas of
Information Extraction, Coreference Resolution, (Bio)Text Mining. Dr Su has
published intensively in natural language processing (NLP) and bioinformatics
conferences and journals, including 12 papers in ACL Annual Meetings, and one
journal article in Computational Linguistics in recent years. Dr Su is active
in professional services for the computational linguistics community. She has
served as Editor / Member of Editorial Board for two international journals.
She is program chair of ACL-IJCNLP09, publication chair of ACL 2007 and IJCNLP
2005, program chair of LBM 2007,Workshop Organizer of LREC2008 Workshop:
Building and evaluating resources for biomedical text mining, and PC members of
numerous NLP conferences including ACL, IJCNLP, COLING, and EMNLP.
Dr Su also led her team to achieve top performances in various information
extraction / text mining benchmarking such as BioCreAtIve. She was responsible
for the effort to establish the largest co-reference annotation corpus that has
2000 Medline abstracts and 24 biomedical full papers from GENIA collection. She
has been the Principal Investigator in multiple technology deployments
including BioMedical Information Management, Intelligence Gathering, Legal /
Standard Enforcement.
Research Interests
Information Extraction; Discourse Analysis; Text Mining; Language Resources
& Evaluation; Machine Translation; Segmentation, Tagging and Chunking;
Machine Learning for Natural Language
Recognitions/Awards
2007 Best performance in BioCreAtIve II (Critical
Assessment for Information Extraction in Biology) in protein
protein Interaction Article Selection sub-task in terms of F Score among
19 international teams including Edinburgh Uni., Uni. of California,
Berkeley, Uni. of Colorado.
2004 Best performance in the closed test of Bio-Named
Recognition task with BioCreAtive I (Critical Assessment of Information
Extraction systems in Biology), among 12 international teams including the 1st
runner up, the joint team of Stanford Uni. and Edinburgh Uni.
2004 Best Performance in Bio-NER task with IJNLPBA,
CoLing 2004, among 8 international teams including the 1st runner up, the joint
team of Stanford Uni. and Edinburgh Univ.
2003 The Enterprise Challenge Award, Singapore
2002 Best Paper Award, Laboratory of Information
Technology, Singapore
2000 Best individual system with English text chunking
task at Computational Natural Language Learning (CoNLL-2000);
Professional Services
Editor / Editorial
Board Member
- ACM Transactions
on Intelligent Systems and Technology (ACM TIST) (2010 -2012);
- International
Journal of Computer Processing of Oriental Languages (2007- Present);
- Journal
of Computational Linguistics and Chinese Language Processing (2004-2008);
Invited talks:
- Invited
talk at School of Computer Engineering, Nanyang Technological University,
Singapore, March 19, 2009
- Keynote
Speech at International Conference on Asian Language Processing 2008,
Chiang Mai, Thailand, Nov 12-14, 2008
- Invited
talk at The National Centre of Text Mining, UK, April 23, 2008
- Invited
talk at 5th joint workshop of BioInformatics and Natural Language
Processing, Korea, Feb 14-15, 2008
- Invited
talk at Industrial Program of European BioInfomatics Institute, UK, Sep
14, 2007
- Invited
talk at MOE & Microsoft Key Laboratory of Natural Language Processing
and Speech, HIT, China, Oct 15, 2005
- Invited
talk at The National Centre of Text Mining, UK, April 14-15, 2005
- Keynote
Speech at 3rd joint workshop of BioInformatics and Natural Language Processing,
Korea, Feb 20-22, 2005
- Invited
speech at CJNLP-04 (4th China-Japan Joint Conference to Promote
Cooperation in Natural Language Processing), Nov. 10-12, 2004
Member of
Conference Organizing Committees
- Program
Chair, ACL/IJCNLP09;
- Workshop
Organizer, LREC2008 Workshop: Building and evaluating resources for
biomedical text mining;
- Program
Committee Chair, International Symposium on Languages in Biology and
Medicine (LBM 2007);
- Publication
Chair, 45th Annual Meeting of the Association for Computational
Linguistics (ACL2007);
- Publication
Chair, International Joint Conference on Natural Language Processing
(IJCNLP05);
- Workshop
chair,International Workshop on Language, Semantics, and Web 2003;
Member of Program
Committees
- ACL
2010, for "information extraction" area;
- CoLing
2010, for "Discourse and Pragmatics" and "information
extraction" area;
- The
23rd Pacific Asia Conference on Language, Information and Computation
(PACLIC 23);
- 3rd
International Symposium on Languages in Biology and Medicine (LBM 2009);
- The
22nd International Conference on the Computer Processing of Oriental
Languages (ICCPOL'09);
- ACL
2008, Reviewer for Information Extraction, Co-reference and Topic
Models Area;
- CoLing2008,
Reviewer for Information Retrieval, Text and Web Ming, Multimedia Area;
- The
3nd International Symposium on Semantic Mining in Biomedicine (SMBM),
2008;
- ACL
2008, BioNLP workshop;
- The
Fourth Asian Information Retrieval Symposium (AIRS 2008) for Natural Language
Processing and IR Area;
- The
3rd International Joint Conference on Natural Language
Processing(IJCNLP-08), for Text Mining and Information Extraction Area;
- ACL2007,
for Tagging and Word Segmentation Area;
- The
2007 Joint Conference on Empirical Methods in Natural Language Processing
and Computational Natural Language Learning (EMNLP-CoNLL 2007), for
Information Extraction Area;
- 15th
Annual International Conference on Intelligent Systems for Molecular
Biology & 6th Annual European Conference on Computational Biology
(ISMB / ECCB 2007), for Text Ming Area;
- IJCAI
2007;
- HLT /
NAACL2007, for Information Extraction: words, concepts and inference Area;
- International
Conference on Computational Linguistics and the Association for
Computational Linguistics (CoLing-ACL06), for Lexical Semantics
Area;
- EMNLP
2006, for Discourse Area and Term & Entity Extraction Area;
- The
29th Annual International ACM SIGIR Conference (SIGIR 2006);
- 11th
conference of the European Chapter of Association for Computational Linguistics
(EACL 2006), for tagging, and word segmentation Area;
- 21st
International Conference on the Computer Processing of Oriental Languages
(ICCPOL 2006);
- Reviewer,
The 14th Annual International conference on Intelligent Systems for
Molecular Biology (ISMB 2006);
- The
2nd International Symposium on Semantic Mining in Biomedicine (SMBM),
2006;
- Third
International Conference on Terminology, Standardization and Technology
Transfer, 2006;
- The
19th Pacific Asia Conference on Language, Information and Computation, Dec
1-3, 2005;
- International
Symposium on Languages in Biology and Medicine (LBM2005);
- ACL05:
for "Machine Learning for NLP" area, "Discourse,
Dialogue and Multimodality" area, and Interactive Poster and Demo;
- Advanced
Technologies for Oriental Information Processing, 2005;
- International
Conference on Chinese Computing(ICCC 2005);
- EMNLP
2004;
- International
Joint workshop on Natural Language Processing in Biomedicine and its
Applications(IJNLPBA), CoLing 2004
- IJCNLP04:
for "Information Extraction &Q/A" area, "Word
Segmentation" area, "Lexical Semantics" area, "Text
Mining in Biomedicine" Thematic Session, Asian Language
Resources workshop;
Reviewer of
Journals
- Computational
Linguistics;
- Bioinformatics;
- PLoS
Computational Biology
- BMC
BioInformatics
- Information
Processing and Management
- International
Journal of Computational Linguistics & Chinese Language Processing
- IEEE
Information System
- Computers
and Humanities
- Journal
of Bioinformatics and Computational Biology
- ACM
Transaction on Asian Language Information Processing
- International
Journal on Applied Intelligence
- International
Journal of Information Technology
- Journal
of Chinese Language and Computing
- Semantic
Web: Revolutionizing Knowledge Discovery in the LifeSciences (book)
Others:
- Invitee,
Dagstuhl seminar on "Text Mining and Ontologies for Life
Sciences", Germany, 2008
- Senior
Member, JHU Summer Workshop on Entity Disambiguation 2007, USA
- Professor(Adjunct),
Harbin Institute of Technology, China (2005 - 2008);
- Associate
Professor(Adjunct), School of Information System, Singapore Management
University (2005 - 2007);
Publications _ Referred
Leading journals
- Man
Lan, Chew Lim Tan, Jian Su, Feature generation and representations for
protein-protein interaction classification, accepted by Journal of
Biomedical Informatics, 2009;
- Man
Lan, Chew Lim Tan, Jian Su, Yue Lu, Supervised and Traditional Term
Weighting Methods for Automatic Text Categorization, IEEE Transactions
on Pattern Analysis and Machine Intelligence,VOL. 31, NO. 4, April.
2009;
- Yang
Xiao Feng, Su Jian, Tan Chew Lim, A Twin-Candidate Model for Learning
Based Coreference Resolution, Computational Linguistics 34: 2,
2008;
- Haizhou
Li, Jin_Shea Kuo, Jian Su, Chih-Lung Lin, Mining Live Transliterations
Using Incremental Learning Algorithms, International Journal of
Computer Processing of Lauguage, Vol. 21, No. 2(2008) 183-203;
- Zhou
GuoDong, Zhang Jie, Su Jian, Shen Dan and Tan Chew Lim. Recognizing Names
in Biomedical Texts: a Machine Learning Approach. BioInformatics,
20(7):1178-1190. DOI:10.1093/bioinformatics /bth060. 2004. ISSN: 1460-2059
[SCI], 112
citations
- Zhang Jie, Shen Dan, Zhou GuoDong, Su Jian
and Tan Chew Lim. Enhancing HMM-based Biomedical Named Entity Recognition
by Studying Special Phenomena. Journal of Biomedical Informatics,
Special Issue on Natural Language Processing in Biomedicine: Aims, Achievements
and Challenge. 37(6). 411-422. 2004. ISSN: 1532-0464 [SCI Expanded],
17 citations
- Zhou GuoDong and Su Jian. Machine
Learning-based Named Entity Recognition via Effective Integration of
Various Evidences. Natural Language Engineering (journal),
11(1): 2005, Cambridge Press. ISSN 1469-8110
- Zhou GuoDong, Shen Dan, Zhang Jie, Su Jian,
Tan Soon Heng and Tan Chew Lim. Recognition of protein and gene names from
text using an ensemble of classifiers and effective abbreviation
resolution. BMC Bioinformatics. ISSN: 1471-2105 [SCI Expanded], 33 citations
- Su Jian, K.T. Ng, H, Li, J-P Haton,
"Nonparametric distance measures of speaker verification", IEE
Electronics Letters, 27th Apr. 1995, Vol.31, No. 9
- H. Li, Su Jian, J-P Haton, "Short-timed
speech dynamics for speaker recognition", IEE Electronics Letters,
17th Aug. 1995, Vol. 31, No. 17
Leading conferences
- Xiaofeng Yang, Jian Su, Jun Lang, Chew Lim
Tan, Ting Liu, Sheng Li, An Entity-Mention Model for Coreference
Resolution with Inductive Logic Programming, Proceedings of ACL-08: HLT,
P843-851, Columbus, Ohio, USA, June 2008.
- Chen Bin, Xiaofeng Yang, Jian Su, Chew Lim
Tan, Other-Anaphora Resolution in Biomedical Texts with Automatic Mined
Patterns, Proceedings of the 22nd International Conference on
Computational Linguistics (CoLing 2008), pages 121-128, Manchester, August
2008.
- Stanley Yong Wai Keong, Su Jian, An
Effective Method of Using Web Based Information for Relation Extraction,
Proceedings of 3rd International Joint Conference of Natural Language
Processing (IJCNLP2008), P350-357, Hyderabab, India.
- Xiaofeng Yang, Jian Su, Coreference
Resolution Using Semantic Relatedness Information from Automatically
Discovered Patterns, Proceedings of the 45th Annual meeting of the
Association for Computational Linguistics, page.528-535, Prague,Czech
Republic June 2007. 18 citations
- Man LAN, Chew Lim TAN, Jian SU, Hwee Boon
LOW, Text Representations for Text Categorization: A Case Study in
BioMedical Domain, pp 2557-2562, proceedings of International Joint
Conference on Neural Networks 2007 (IJCNN 2007), Orlando, FL, USA.
- X. Yang, J. Su & C.L. Tan. 2006.
Kernel-based pronoun resolution with structured syntactic knowledge, In
Proceedings of the 21st International Conference on Computational
Linguistics and the 44th Annual Meeting of the Association for
Computational Linguistics (COLING-ACL 06), Pages 41- 48. Sydney,
Australia.27 citations
- GuoDong
Zhou, Jian Su and Min Zhang. Modeling Commonality among Related Classes in
Relation Extraction. In Proceedings of the 21st International Conference
on Computational Linguistics and the 44th Annual Meeting of the
Association for Computational Linguistics (COLING-ACL 2006), Pages
121-128, Sydney, Australia.
- Min
Zhang, Jie Zhang, Jian Su, Guodong Zhou. A Composite Kernel to Extract
Relations between Entities with Both Flat and Structured Features. In
Proceedings of the 21st International Conference on Computational
Linguistics and the 44th Annual Meeting of the Association for
Computational Linguistics (COLING-ACL 2006), Pages 825-832. Sydney, Australia.
50
citations
- Min
Zhang, Jie Zhang and Jian Su. Exploring Syntactic Features for Relation
Extraction using a Convolution Tree Kernel. In Proceedings of the Human
Language Technology conference - North American chapter of the Association
for Computational Linguistics annual meeting (HLT-NAACL 2006), New York,
USA. 19
citations
- Aw
AiTi, Zhang Min, Xiao Juan, Su Jian, A phrase based statistical model for
SMS texts normalization (poster), Annual Meeting of the Association for
Computational Linguistics (COLING-ACL 2006), Pages 33-40. Sydney,
Australia.
- Zhou
GuoDong, Su Jian, Zhang Jie, Zhang Min, Exploring Various Knowledge in
Relation Extraction, Proceedings of the 43rd Annual meeting of the
Association for Computational Linguistics (ACL05), Ann Arbor, Michigan,
US, pp.427-434. 102
citations
- Yang
Xiao Feng, Su Jian, Tan Chew Lim, Improving Pronoun Resolution Using
Statistics-Based Semantic Compatibility Information, Proceedings of
the 43rd Annual meeting of the Association for Computational Linguistics
(ACL05), Ann Arbor, Michigan, US, pp.165-172. 25
citations
- Min
ZHANG, Jian SU, Danmei WANG, Guodong ZHOU, Chew Lim Tan, Discovering
Relations from a Large Raw Corpus Using Tree
Similarity - based Clustering, Natural Language Processing – IJCNLP
2005, Second International Joint Conference, Jeju, Korea, October,
2005, proceedings, pp 378-389, LNAI 3615 Springer.
- Min
ZHANG, Haizhou LI, Jian SU and Hendra SETIWAN, A Phrase-based
Context-dependent Joint Probability Model for Named Entity Translation, Natural
Language Processing – IJCNLP 2005, Second International Joint Conference,
Jeju, Korea, October, 2005, proceedings, pp 600-611, LNAI 3615 Springer.
- Xiaofeng
Yang, Jian Su and Chew Lim Tan, A Twin-Candidate Model of Coreference
Resolution with Non-Anaphor Identification Capability, Natural
Language Processing – IJCNLP 2005, Second International Joint Conference,
Jeju, Korea, October, 2005, proceedings, pp 719-730, LNAI 3615 Springer. Was
selected as one of the best papers for publication on ACM Transaction on
Asian Language Information Processing.
- XiaoFeng
Yang, Jian Su, GuoDong Zhou, ChewLim Tan. Improving Pronoun Resolution by
Incorporating Coreferential Information of candidates. P128-135, ACL 2004,
July 21-26, Bacelona, Spain.30
citations
- Dan
Shen, Jie Zhang, Jian Su, Guodong Zhou, Chew-Lim Tan. Multi-Criteria-based
Active Learning for Named Entity Recognition. P590-597, ACL 2004, July
21-26, Bacelona, Spain. 54
citations
- Li
Haizhou, Zhang Min, Su Jian. A Joint Source-Channel Model for Machine
Transliteration. P160-167, ACL 2004, July 21-26, Bacelona, Spain.
86 citations
- Xiaofeng
Yang, Jian Su, Guodong Zhou and Chew Lim Tan. A NP-Cluster Based Approach
to Coreference Resolution. P226-232, Proceedings of 20th International
Conference on Computational Linguistics (COLING'2004). Aug 23-27, 2004,
Geneva, Switzerland. 25
citations
- Zhou
GuoDong and Su Jian. A high-performance coreference resolution system
using a multi-agent strategy. P522-528, Proceedings of 20th International
Conference on Computational Linguistics (COLING'2004). Aug 23-27, 2004,
Geneva, Switzerland.
- Zhang
Min, Li Haizhou and Su Jian. Direct Orthographical Mapping for Machine
Transliteration. P716-722, Proceedings of 20th International Conference on
Computational Linguistics (COLING'2004). Aug 23-27, 2004, Geneva,
Switzerland.
- XiaoFeng
Yang, GuoDong Zhou, Jian Su and Chew-Lim Tan. Improving Noun Phrase
Coreference Resolution by Matching Strings. Proceedings of 1st International
Joint Conference on Natural Language Processing (IJCNLP'2004), March
22-24, 2004, Sanya, China, pp226-333. 18
citations
- Yang
XiaoFeng, Zhou GuoDong, Su Jian and Tan ChewLim. Coreference Resolution
Using Competition Learning Approach. Proceedings of ACL 2003, Sapporo,
Japan, 7-12 July 2003, pp176-183. 79
citations
- Zhou
GuoDong and Su Jian. Named Entity Recognition Using a HMM-based Chunk
Tagger. Proceedings of ACL 2002, Philadelphia, US, July 2002. 171
citations
- Zhou
GuoDong and SU Jian. Error-driven HMM-based Chunk Tagger with
Context-dependent Lexicon. Proceedings of EMNLP/VLC-2000, Hong Kong, Oct
7-8 2000. 33
citations
- Zhou
GuoDong, Su Jian and Tey TongGuan. Hybrid Text Chunking. Proceedings of
CoNLL'2000, Lisbon, Portugal, Sept 11-14 2000. 30
citations
- Su
Jian, H. Li, K.T. Ng, Speaker time-drifting adaptation using trajectory
mixture hidden Markov Models, IEEE ICASSP'96, Atlanta, USA
Others
- Jian
Su and Christopher Baker (Eds.), BMC Bioinformatics Special Issue on
Proceedings of the Second International Symposium on Languages in Biology
and Medicine(LBM) 2007, 9(Suppl 3):S1, 11 April 2008.
- Robert
Dale, Kam-Fai Wong, Jian Su, and Oi Yee Kwong (Eds.), Natural Language
Processing – IJCNLP 2005, Second International Joint Conference, Jeju
Island, Korea, October 11-13, 2005. Proceedings, Lecture Notes in
Artificial Intelligence (LNAI 3651, Springer)
- Man
LAN, Chew Lim TAN, Jian SU, A Term Investigation and Majority Voting for
Protein Interaction Article Sub-task 1 (IAS), Proceedings of the Second
BioCreAtIve Challenge Evaluation Workshop, P183-185, Spain.
- Nie
Yu, Yang Lingpeng, Zhang Jie, Su Jian, Ji Dong Hong, I2R at TREC 2006
Genomics Track, The Fifteenth Text REtrieval Conference (TREC 2006)
Proceedings.
- Nie
Yu, Yang Lingpeng, Ji Donghong, Zhang Jie, Su Jian, Yang Xiaofeng,
Soon-Heng Tan, Xiao Juan, Zhou Guodong, TREC 2005 Genomics Track at I2R,
The Fourteenth Text REtrieval Conference (TREC 2005) Proceedings.
- Juan
Xiao, Jian Su, GuoDong Zhou, Chewlim Tan, Protein-Protein Interaction
Extraction: A supervised Learning Approach, Proceedings of Symposium on
Semantic Mining in Biomedicine, http://CEUR-WS.org/Vol-148/ , 2005, 27
citations
- Zhou
GuoDong, Su Jian, Resolution of Data Sparseness in Named Entity
Recognition using Hierarchical Features and Feature Relaxation Principle, Proceedings
of 6th International Conference on Intelligent Text Processing and
Computational Linguistics, Lecture Notes in Computer Science (LNCS 3406,
Springer). pp. 745-756, 13-19 Feb 2005, Mexico City.
- Xiaofeng
Yang, Jian Su. Entity-Based Noun Phrase Coreference Resolution, Proceedings
of 6th International Conference on Intelligent Text Processing and
Computational Linguistics, Lecture Notes in Computer Science (LNCS 3406,
Springer), PP 213- 216, 13-19 Feb 2005. Mexico City.
- Zhou
GuoDong and Su Jian. Exploring deep knowledge resources in biomedical name
recognition. Proceedings of 2004 Joint Workshop on Natural Language
Processing in Biomedicine and its Applications (JNLPBA'2004 shared task),
P99-102, Aug 28-29, 2004, Geneva, Switzerland. 72
citations
- Zhou
GuoDong, Shen Dan, Zhang Jie, Su Jian, Tan Soon Heng and Tan Chew Lim.
Recognition of protein and gene names from text using an ensemble of
classifiers and effective abbreviation resolution. Proceedings of
BioCreAtIvE Workshop, pp 26-30, March 28-31, 2004, Granada, Spain.
- Dan
Shen, Jie Zhang, Jian Su, GuoDong Zhou and Chew-Lim Tan. A Collaborative
Ability Measurement for Co-training. Proceedings of 1st International Joint
Conference on Natural Language Processing (IJCNLP'2004), pp608-613, March
22-24, 2004, Sanya, China.
- GuoDong
Zhou and Jian Su. Integrating Various Features in Hidden Markov Model
using Constraint Relaxation Algorithm for Recognition of Named Entities
without Gazetteers. Proceedings of IEEE International Conference on NLP
& KE, pp732-739, Oct 26-29, 2003, Beijing, China.
- Dan
Shen, Jie Zhang, Guodong Zhou, Jian Su, Chew-Lim Tan. Effective Adaptation
of Hidden Markov Model-based Named Entity Recognizer for Biomedical
Domain. Proceedings of the ACL03 Workshop on Natural Language Processing
in Biomedicine, pp49-56, Sapporo, Japan, July 2003. 48
citations
- Zhou
GuoDong and Su Jian. A Chinese Efficient Analyzer Integrating Word
Segmentation, Part-Of-Speech Tagging, Partial Parsing and Full Parsing.
Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing,
pp78-83, ACL'03, Sapporo, Japan, 11-12 July 2003.
- Jian
Su, Jin Guo, Loong Cheong Tong, "Target Word Selection with
Co-occurrence and Translation Information", Machine Translation
Summit VII'99, 1999.
- Su
Jian, K.T. Ng, Xu Bing Zheng, "Speaker Recognition with
discriminative speaker VQ models", ESCA, Eurospeech'95, Madrid, Spain
- H.
Li, J-P Haton, Jian Su and Y. Gong, "Speaker recognition with
temporal transition models", ESCA, Eurospeech'95, Madrid, Spain,
- Su
Jian, Da Beng Liu, H. Li, "Study of discriminative neural networks in
speech recognition", IEEE ICNNSP'95, Nanjing, China
Recent Projects/Grants
- Principle
Investigator, High Performance Entity Tracking System, 2007-2008;
To develop a high performance Named Entity
Recognition and Co-reference Resolution engine for text mining systems on intelligence
gathering.
- Principal
Investigator (I2R, Singapore), Supervisory Committee Member, Project
Management Committee Member, Work Package Leader, EU strategic research
project under IST 4 call: BootStrep. 2006 - 2009
The project consortium consists of 6 EU parties and
I2R team. My team will provide co-reference resolution besides other nlp tools
being part of human language technology infrastructure for bio-knowledge
extraction and bio-ontology acquisition. I will further lead the final work package
to validate bio-lexicon and bio-ontology generated in the first two years with
information extraction, information retrieval and multilingual access tasks.
This work package involves 5 teams including UK national text mining centre,
European BioInformatics Institute, Jena Univ.(Germany), Freiburg Univ.
(Germany), Universit_ de Rennes (France) and I2R.
- Principal
Investigator, I2R-Tokyo University joint project "BioCo Full Paper
Annotation", 2006 -2007
This project is to annotate co-reference
information on biology full papers. The annotated corpus, BioCo, currently with
24 full biomedical papers annotated, the largest full paper annotation, is an
important resource for co-reference resolution on bio-literature domain for
information extraction and text-mining purposes.
- Principal
Investigator, I2R-Tokyo University joint project "MedCo Corpus
Annotation". 2004 – 2006
This project is to annotate co-reference
information on MedLine abstracts with GENIA corpus, the most popular resources
for natural language processing in biology literature constructed by Tsujii_s
lab. The annotated corpus, MedCo with 1999 abstracts, the largest coreference
annotation is an important resource for co-reference resolution on
bio-literature domain for information extraction and text-mining purposes.
- Co-Principal
Investigator, "A Personalized and Adaptive Literature Curation System
for the Biomedical Science", 2005 – 2008
To build a grid based literature curation system
together with National Cancer Centre, Genome Institute of Singapore, National
University of Singapore, Bioinformatics Institute, National Grid Office,
Singapore.
- Principal
Investigator, I2R-SOC NUS joint project "Information Extraction on
Biology Literature", July 2003 - June 2007
This project is to develop information extraction
technologies and to build information management applications for biology
literature. 9 postgraduate / PhD students and 2 Research fellows has been
developed in this project, besides other achievements including a number of
publications, benchmark competitions and etc.
- Technical
Advisor, Material Safety Document Sheet Knowledge Workbench project, 2003-
2004
The project is to build a system extracting
material safety information from document sheets and further checking the
validation according to international standards. The system is delivered to an
government organization. Instead of only being able to randomly check 5 % of
large amount of datasheets, the goverment officer could check 100% of
datasheets with the help of the system. It won The Enterprise Challenge Award
2003.
- Project
Manager, TextMining CoT funding project, 2004;
The project is to build SDK tools with the engines
built in house to make them ready for commercialization. The tools cover information
extraction, text clustering, text classification, text retrieval, summarization
and term extraction technologies. The project has led to a number of licensing
to SMEs and 5 Polytechnics and good publicity for enable the company _ready to
_run_ and go to market fast with a new product_ .
·