Working Experiences

  Sep. 2005 - Now          Member of Natural Language Processing Group, Nanjing University, supervised by Prof. Jiajun Chen.

  Oct. 2007 - Aug. 2008  Intern at Natural Language Computing group, MSRA, working with Long Jiang and Mu Li.

Research Experiences and Projects

Currently, I’m focusing on the research of Statistical Machine Translation. My research interests also include topics like  Coreference Resolution and other NLP problems. I’m especially interested at using various machine learning techniques to solve complicated NLP problems.

  Statistical Machine Translation:                                                                                              Sep. 2008 - Now @NLP.NJU

We initiate a SMT project which will first build up a system with the help of Moses and then add extensions or modifications to involve more syntactic or semantic information. We are planning researches and experiments about various parts of the whole SMT procedure, for example word alignment, phrase extraction and scoring, phrase reordering, etc.

Our work starts with word alignment (Publication [2]). Observing that AER scores do not correlate well with BLEUs, we propose a variation, Error Sensitive AER (ESAER), for the evaluation of alignment results. ESAER takes the phrase extraction procedure into consideration and punishes different types of errors according to their affects to phrase extraction result. Experiments show a huge improvement of correlation to BLEU over original AER. We are also trying supervised and semi-supervised methods to improve the alignment result.

On a preprocessing step, we are trying to segmenting long sentence pairs into shorter ones, which may make followed steps more efficient. See Publication [6] for a preliminary result.

  Statistical Machine Translation for evaluation:                                                             Dec. 2008 - Aug. 2009 @NLP.NJU

We participated two machine translation evaluations, NIST Open MT 2009 Evaluation (MT09) (system report) and the evaluation of 5th China Workshop on Machine Translation (CWMT2009) (system report, in Chinese). Our systems were built mainly based on Moses. We added some phrase table feature and employed factored model to improve the result.

  Statistical Machine Translation:                                                                                    Apr. 2008 - Aug. 2008 @NLC.MSRA

Collaborating with Mu Li, we tried to improve the performance of word alignment by combining the outputs of different systems, for example Giza++, Berkeley aligner, etc. Experiments showed an increasing of BLEU score of about 0.5 points after combination.

I was also responsible for the rule extraction part of SMT system. Firstly I maintained the code for  Hiero rule extraction and did some rule filtering experiments. Later on, I also implemented some syntactic rule extraction procedures according to some papers from ISI, USC.

  Chinese Couplet project:                                                                                               Oct. 2007 - Mar. 2008 @NLC.MSRA

This project aims at building automatic second-line and banner generation systems for Chinese Couplet. I mainly worked with Long Jiang, on the automatic evaluation of second-line generation system. We thought about various evaluation metrics in IR and SMT, and finally employed BLEU, the evaluation metric of SMT, and built our evaluation framework based on some pseudo-reference second-lines.

  Coreference Resolution:                                                                                                  Jul. 2006 - Jul. 2007 @NLP.NJU

Coreference Resolution aims at identifying different mentions in a document that refer to the same entity. We firstly worked on unsupervised methods based on graph cut. Then some supervised learning techniques such as perceptron and Markov Logic Networks are employed. Also, a clustering technique called correlation clustering is used in the experiments. Later, we also did some research on automatic generation of first order logic rules and various clustering techniques. Some of these work are presented in Publication [1][3][5][7].

  Chinese word segmentation and POS tagging:                                                             Nov. 2005 - Apr. 2006 @NLP.NJU

In-house implementation of a Chinese word segmenter and a POS tagger based on HMM.



Publications

2009

7. Yabing Zhang, Junsheng Zhou, Shujian Huang and Jiajun Chen. Combining ILP and MLN for Coreference Resolution. in International Conference on Asian Language Processing (IALP 2009), Singapore, Dec 7-9, 2009

6. Biping MENG, Shujian Huang, Xinyu Dai and Jiajun Chen. Segmenting Long Sentence Pairs for Statistical Machine Translation. in International Conference on Asian Language Processing (IALP 2009), Singapore, Dec 7-9, 2009

5. Liu Weipeng, Zhou Junsheng, Huang Shujian and Chen Jiajun. Global Optimization Based On Clustering for Coreference Resolution. in The 10th Chinese National Conference on Computational Linguistics (CNCCL-2009), Yantai, China, July 24-26, 2009. (In Chinese, slides)

4. Shujian Huang, Ning Xi, Yinggong Zhao, Xinyu Dai, Jiajun Chen. An Error-Sensitive Metric for Word Alignment in Phrase-based SMT. Journal of Chinese Information Processing, 2009, vol. 23, no. 3. (Revised version of CWMT2008 paper, In Chinese)

3. Shujian Huang, Yabing Zhang, Junsheng Zhou, Jiajun Chen. Coreference Resolution using Markov Logic Networks. (Best Poster Award (1/25) in The 10th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing'2009), Mexico city, Mexico, 2009 poster; in Research in Computing Science: Advances in Computational Linguistics, Alexander Gelbukh Ed., vol. 41, page 157~168, ISSN: 1870-4069)

2008

2. Shujian Huang, Ning Xi, Yinggong Zhao, Xinyu Dai, Jiajun Chen. An Error-Sensitive Metric for Word Alignment in Phrase-based SMT. presented in The 4th China Workshop on Machine Translation, CWMT'2008, Beijing, China, 2008. (In Chinese, slides)

2007

1. Junsheng Zhou, Shujian Huang, Jiajun Chen and Weiguang Qu. A New Graph Clustering Algorithm for Chinese Noun Phrase Coreference Resolution. Journal of Chinese Information Processing, 2007, vol. 21, no. 2. (In Chinese)

Technical Reports:

Shujian Huang, Yinggong Zhao, Boyuan Li, Qiufeng Wu, Xinyu Dai, Jiajun Chen. Nanjing University's System Report for NIST MT09 Workshop. Included in the materials of NIST Open MT 2009, Ottawa, ON, Canada. Aug 31-Sep 1, 2009.

Shujian Huang, Yinggong Zhao, Boyuan Li, Qiufeng Wu, Xinyu Dai, Jiajun Chen. NJU-NLP's Technique Report for the 5th China Workshop on Machine Translation. Included in the materials of CWMT 2009, Nanjing, China. Oct. 15-16, 2009. (In Chinese)



Back to Main Page

 


Last modified: Oct, 13 2009 by Shu-Jian HUANG