|
Working
Experiences |
|
Sep.
2005 - Now Member of Natural
Language Processing Group, Nanjing University, supervised by Prof. Jiajun
Chen.
Oct.
2007 - Aug. 2008 Intern at Natural Language Computing group,
MSRA, working with Long Jiang and Mu Li. |
|
Currently, I’m focusing
on the research of Statistical Machine Translation. My research interests
also include topics like Coreference Resolution and other NLP problems.
I’m especially interested at using various machine learning techniques
to solve complicated NLP problems.
Statistical
Machine
Translation:
Sep. 2008 - Now @NLP.NJU We initiate a SMT
project which will first build up a system with the help of Moses and then
add extensions or modifications to involve more syntactic or semantic
information. We are planning researches and experiments about various parts
of the whole SMT procedure, for example word alignment, phrase extraction and
scoring, phrase reordering, etc. Our work starts with
word alignment (Publication [2]). Observing that AER scores do not correlate
well with BLEUs, we propose a variation, Error Sensitive AER (ESAER), for the
evaluation of alignment results. ESAER takes the phrase extraction procedure
into consideration and punishes different types of errors according to their
affects to phrase extraction result. Experiments show a huge improvement of
correlation to BLEU over original AER. We are also trying supervised and
semi-supervised methods to improve the alignment result. See Publication [8]
for our experiments using Co-training style algorithms. On a preprocessing
step, we are trying to segmenting long sentence pairs into shorter ones,
which may make followed steps more efficient. See Publication [6] for a
preliminary result.
Statistical
Machine Translation for
evaluation:
Dec. 2008 - Aug. 2009 @NLP.NJU We participated two
machine translation evaluations, NIST Open MT 2009 Evaluation (MT09) (system report) and
the evaluation of 5th China Workshop on Machine Translation (CWMT2009) (system report, in Chinese). Our systems
were built mainly based on Moses. We added some phrase table feature and
employed factored model to improve the result.
Statistical
Machine
Translation:
Apr.
2008 - Aug. 2008 @NLC.MSRA Collaborating with
Mu Li, we tried to improve the performance of word alignment by combining the
outputs of different systems, for example Giza++, Berkeley aligner, etc.
Experiments showed an increasing of BLEU score of about 0.5 points after
combination. I was also
responsible for the rule extraction part of SMT system. Firstly I maintained
the code for Hiero rule extraction and did some rule filtering
experiments. Later on, I also implemented some syntactic rule extraction
procedures according to some papers from ISI, USC.
Chinese
Couplet
project:
Oct. 2007 - Mar. 2008 @NLC.MSRA This project aims at
building automatic second-line and banner generation systems for Chinese
Couplet. I mainly worked with Long Jiang, on the automatic evaluation of
second-line generation system. We thought about various evaluation metrics in
IR and SMT, and finally employed BLEU, the evaluation metric of SMT, and
built our evaluation framework based on some pseudo-reference second-lines.
Coreference
Resolution:
Jul. 2006 - Jul. 2007
@NLP.NJU Coreference
Resolution aims at identifying different mentions in a document that refer to
the same entity. We firstly worked on unsupervised methods based on graph
cut. Then some supervised learning techniques such as perceptron and Markov
Logic Networks are employed. Also, a clustering technique called correlation
clustering is used in the experiments. Later, we also did some research on automatic
generation of first order logic rules and various clustering techniques. Some
of these work are presented in Publication [1][3][5][7].
Chinese
word segmentation and POS
tagging:
Nov. 2005 - Apr. 2006 @NLP.NJU In-house
implementation of a Chinese word segmenter and a POS tagger based on HMM. |
|
|
|
2010 8.
Shujian Huang, Kangxi Li, Xinyu
Dai and Jiajun Chen. Improving Word Alignment by Semi-supervised Ensemble. To
be appear in CoNLL 2010. 2009 7.
Yabing Zhang, Junsheng Zhou, Shujian Huang and Jiajun Chen. Combining
ILP and MLN for Coreference Resolution. in International Conference on
Asian Language Processing (IALP 2009), Singapore, Dec 7-9, 2009 6.
Biping MENG, Shujian Huang, Xinyu Dai and Jiajun Chen. Segmenting
Long Sentence Pairs for Statistical Machine Translation. in International
Conference on Asian Language Processing (IALP 2009), Singapore, Dec 7-9, 2009
5.
Liu Weipeng, Zhou Junsheng, Huang Shujian and Chen Jiajun. Global
Optimization Based On Clustering for Coreference Resolution. in The 10th
Chinese National Conference on Computational Linguistics (CNCCL-2009),
Yantai, China, July 24-26, 2009. (In Chinese, slides) 4.
Shujian Huang, Ning Xi, Yinggong Zhao, Xinyu Dai, Jiajun Chen. An
Error-Sensitive Metric for Word Alignment in Phrase-based SMT. Journal of
Chinese Information Processing, 2009, vol. 23, no. 3. (Revised version of
CWMT2008 paper, In Chinese) 3.
Shujian Huang, Yabing Zhang, Junsheng Zhou, Jiajun Chen. Coreference
Resolution using Markov Logic Networks. (Best Poster Award (1/25) in
The 10th International Conference on Intelligent Text Processing and
Computational Linguistics (CICLing'2009),
Mexico city, Mexico, 2009 poster;
in Research in Computing Science: Advances in Computational Linguistics,
Alexander Gelbukh Ed., vol. 41, page 157~168, ISSN: 1870-4069) 2008
2.
Shujian Huang, Ning Xi, Yinggong Zhao, Xinyu Dai, Jiajun Chen. An
Error-Sensitive Metric for Word Alignment in Phrase-based SMT. presented
in The 4th China Workshop on Machine Translation, CWMT'2008, Beijing, China,
2008. (In Chinese, slides) 2007
1.
Junsheng Zhou, Shujian Huang, Jiajun Chen and Weiguang Qu. A New
Graph Clustering Algorithm for Chinese Noun Phrase Coreference Resolution.
Journal of Chinese Information Processing, 2007, vol. 21, no. 2. (In Chinese) Technical
Reports: Shujian
Huang,
Yinggong Zhao, Boyuan Li, Qiufeng Wu, Xinyu Dai, Jiajun Chen. Nanjing
University's System Report for NIST MT09 Workshop. Included
in the materials of NIST Open MT 2009, Ottawa, ON,
Canada. Aug 31-Sep 1, 2009. Shujian
Huang,
Yinggong Zhao, Boyuan Li, Qiufeng Wu, Xinyu Dai, Jiajun Chen. NJU-NLP's
Technique Report for the 5th China Workshop on Machine Translation.
Included in the materials of CWMT 2009, Nanjing, China. Oct. 15-16, 2009. (In
Chinese) |
|
|