Research on Machine Translation and Generations

We're investigating various methods to improve the capacity of multilingualism, by designing novel MT architectures with the help of both task-specific models and LLMs, evaluating/estimating the quality of MT, bringing human into the learning loop, improving translation performance for low-resource language pairs or domains, etc.

Along with MT, We're also modelling different natural language generation tasks, e.g., summarization, paraphrasing, text style transfer, text simplification, answering questions with natural text, dialog, etc. Certain controlling factors need to be considered for a specific generation task, which brings interesting challenges. Other topics for generation include the evaluation, factual consistency and explainability, etc.

Presentations

(mostly in Chinese, hosted on bilibili.com)

Research Talks (of our own research)

Introduction (for new comers)


Selected Publications

A More Complete List on Google Scholar

* marks corresponding author(s).

2025

TRANS-ZERO: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data.
Wei Zou, Sen Yang, Yu Bao, Shujian Huang*, Jiajun Chen, Shanbo Cheng*.
Findings of ACL 2025. (arxiv:2504.14669, code)

Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation.
Xiang Geng, Zhejian Lai, Jiajun Chen, Hao Yang, Shujian Huang*.
ACL 2025. (arXiv:2502.19941, code)

Self-Evolution Knowledge Distillation for LLM-based Machine Translation.
Yuncheng Song, Liang Ding, Changtong Zan, Shujian Huang*.
COLING 2025. (arxiv:2412.15303)

Enforcing Paraphrase Generation via Controllable Latent Diffusion.
Wei Zou, Ziyuan Zhuang, Shujian Huang*, Jia Liu, Jiajun Chen.
Frontiers of Computer Science. (arxiv:2404.08938, code)

2024

The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights.
Wenhao Zhu, Shujian Huang*, Fei Yuan, Jiajun Chen, Alexandra Birch.
preprint. (arxiv:2405.01345, code)

Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation.
Xu Huang, Zhirui Zhang*, Xiang Geng, Yichao Du, Jiajun Chen, Shujian Huang*.
Findings of ACL 2024. (arxiv:2401.06568, code)

MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation.
Jiahuan Li, Shanbo Cheng, Shujian Huang*, Jiajun Chen.
NAACL 2024. (arxiv:2403.09522, code)

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis.
Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, Shujian Huang*, Lingpeng Kong, Jiajun Chen, Lei Li.
Findings of NAACL 2024. (arxiv:2304.04675, video, code)

kNN-BOX: A Unified Framework for Nearest Neighbor Generation.
Wenhao Zhu, Qianfeng Zhao, Yunzhe Lv, Shujian Huang*, Siheng Zhao, Sizhe Liu, Jiajun Chen.
EACL 2024 System Demonstrations. (arxiv:2302.13574, video, code)

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions.
Jiahuan Li, Hao Zhou, Shujian Huang*, Shanbo Cheng, Jiajun Chen.
TACL 2024. (arxiv:2305.15083, video, code)

2023

Dictionary Definition Augemented Neural Machine Translation for Anciet Chinese Text.
Jiahuan Li, Ruochun Wu, Wenjing Hu, Jixuan Chen, Weilu Xu, Shujian Huang*, Jiajun Chen.
CCMT2023 (in Chinese). Best Paper Award.

IMTLab: An Open-Source Platform for Building, Evaluating, and Diagnosing Interactive Machine Translation Systems.
Xu Huang, Zhirui Zhang, Ruize Gao, Yichao Du, Lemao Liu, Guoping Huang, Shuming Shi, Jiajun Chen, Shujian Huang*.
EMNLP 2023. (code)

Improved Pseudo Data for Machine Translation Quality Estimation with Constrained Beam Search.
Xiang Geng, Yu Zhang, Zhejian Lai, Shuaijie She, Wei Zou, Shimin Tao, Hao Yang, Jiajun Chen, Shujian Huang*.
EMNLP 2023.

Only 5% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation.
Zihan Liu, Zewei Sun, Shanbo Cheng, Shujian Huang, Mingxuan Wang.
IJCNLP-AACL 2023.

What Knowledge Is Needed? Towards Explainable Memory for kNN-MT Domain Adaptation.
Wenhao Zhu, Shujian Huang*, Yunzhe Lv, Xin Zheng and Jiajun CHEN.
Findings of ACL 2023. (code)

INK: Injecting kNN Knowledge in Nearest Neighbor Machine Translation.
Wenhao Zhu, Jingjing Xu, Shujian Huang*, Lingpeng Kong and Jiajun CHEN.
ACL 2023. (code)

BLEURT Has Universal Translations: An Analysis of Automatic Metrics by Minimum Risk Training.
Yiming Yan, Tao Wang, Chengqi Zhao, Shujian Huang*, Jiajun CHEN and Mingxuan Wang.
ACL 2023. (video, code)

Selective Knowledge Distillation for Non-Autoregressive Neural Machine Translation.
Min Liu, Yu Bao, Chengqi Zhao, Shujian Huang*.
AAAI 2023. (code)

CoP: Factual Inconsistency Detection by Controlling the Preference.
Shuaijie She, Xiang Geng, Shujian Huang*, Jiajun Chen.
AAAI 2023. (video, code)

Denoising Pre-Training for Machine Translation Quality Estimation with Curriculum Learning.
Xiang Geng, Yu Zhang, Jiahuan Li, Shujian Huang*, Hao Yang, Shimin Tao, Yimeng Chen, Ning Xie, Jiajun Chen.
AAAI 2023. (video, code)

2022

Better Datastore, Better Translation: Generating Datastores from Pre-Trained Models for Nearest Neural Machine Translation.
Jiahuan Li, Shanbo Cheng, Zewei Sun, Mingxuan Wang, Shujian Huang*.
preprint. (arxiv:2212.08822)

Unsupervised Paraphrasing via Syntactic Template Sampling.
Yu Bao, Shujian Huang*, Hao Zhou, Lei Li, Xinyu Dai, Jiajun Chen.
SCIENTIA SINICA Informationis 2022. (in Chinese)

Helping the Weak Makes You Strong: Simple Multi-Task Learning Improves Non-Autoregressive Translators,
Xinyou Wang, Zaixiang Zheng*, Shujian Huang*.
EMNLP 2022. (short paper)

FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation.
Wenhao Zhu, Shujian Huang*, Tong Pu, Pingxuan Huang, Xu Zhang, Jian Yu, Wei Chen, Yanfeng Wang, Jiajun Chen.
LREC 2022.

BiTIIMT: A Bilingual Text-infilling Method for Interactive Machine Translation.
Yanling Xiao, Lemao Liu*, Guoping Huang, Qu Cui, Shujian Huang*, Shuming Shi, Jiajun Chen.
ACL 2022.

latent-GLAT: Glancing at Latent Variables for Parallel Text Generation.
Yu Bao, Hao Zhou, Shujian Huang*, Dongqi Wang, Lihua Qian, Xinyu Dai, Jiajun Chen, Lei Li.
ACL 2022.

Non-Parametric Online Learning from Human Feedback for Neural Machine Translation.
Dongqi Wang, Haoran Wei, Zhirui Zhang, Shujian Huang*, Jun Xie, Jiajun Chen.
AAAI 2022.

2021

Duplex Sequence-to-Sequence Learning for Reversible Machine Translation.
Zaixiang Zheng, Hao Zhou*, Shujian Huang, Jiajun Chen, Jingjing Xu, Lei Li.
NeurIPS 2021.

Learning Kernel-Smoothed Machine Translation with Retrieved Examples.
Qingnan Jiang, Mingxuan Wang, Jun Cao, Shanbo Cheng, Shujian Huang*, Lei Li.
EMNLP 2021.

Non-Parametric Unsupervised Domain Adaptation for Neural Machine Translation.
Xin Zheng, Zhirui Zhang, Shujian Huang*, Boxing Chen, Jun Xie, Weihua Luo, Jiajun Chen.
Findings of EMNLP 2021. (short paper)

Adaptive Nearest Neighbor Machine Translation.
Xin Zheng, Zhirui Zhang, Junliang Guo, Shujian Huang*, Boxing Chen, Weihua Luo, Jiajun Chen.
ACL 2021. (short paper)

When is Char Better Than Subword: A Systematic Study of Segmentation Algorithms for Neural Machine Translation.
Jiahuan Li, Yutong Shen, Shujian Huang*, Xin-Yu Dai, Jiajun Chen.
ACL 2021. (short paper)

Non-Autoregressive Translation by Learning Target Categorical Codes.
Yu Bao, Shujian Huang*, Tong Xiao, Dongqi Wang, Xin-Yu Dai, Jiajun Chen.
NAACL 2021.

DirectQE: Direct Pretraining for Machine Translation Quality Estimation.
Qu Cui, Shujian Huang*, Jiahuan Li, Xiang Geng, Zaixiang Zheng, Guoping Huang, Jiajun Chen.
AAAI 2021.

2020

Toward Making the Most of Context in Neural Machine Translation.
Zaixiang Zheng, Xiang Yue, Shujian Huang*, Jiajun Chen, Alexandra Birch.
IJCAI 2020.

Improving Self-Attention Networks with Sequential Relations.
Zaixiang Zheng, Shujian Huang*, Rongxiang Weng, Xinyu Dai, Jiajun Chen.
IEEE/ACM Transactions on Audio, Speech, and Language Processing 2020.

Mirror-Generative Neural Machine Translation.
Zaixiang Zheng, Hao Zhou, Shujian Huang*, Lei Li, Xin-Yu Dai, Jiajun Chen.
ICLR 2020. (with Highest Ratings from all reviewers, Oral Presentation(selected))

A Reinforced Generation of Adversarial Examples for Neural Machine Translation.
Wei Zou, Shujian Huang*, Jun Xie, Xinyu Dai, Jiajun Chen.
ACL 2020.

Explicit Semantic Decomposition for Definition Generation.
Jiahuan Li, Yu Bao, Shujian Huang*, Xinyu Dai, Jiajun Chen.
ACL 2020.

RPD: A Distance Function Between Word Embeddings.
Xuhui Zhou, Zaixiang Zheng, Shujian Huang.
ACL Student Research Workshop 2020.

GRET: Global Representation Enhanced Transformer.
Rongxiang Weng, Shujian Huang*, Hao-Ran Wei, Heng Yu, Weihua Luo, Lidong Bing, Jiajun Chen.
AAAI 2020.

Generating Diverse Translation by Manipulating Multi-Head Attention.
Zewei Sun, Shujian Huang*, Hao-Ran Wei, Xin-yu Dai, Jiajun Chen.
AAAI 2020.

Acquiring Knowledge from Pre-trained Model to Neural Machine Translation.
Rongxiang Weng, Heng Yu, Shujian Huang*, Shanbo Cheng, Weihua Luo.
AAAI 2020.

2019

Improving Bilingual Lexicon Induction on Distant Language Pairs.
Wenhao Zhu, Zhihao Zhou, Shujian Huang*, Zhenya Lin, Xiangsheng Zhou, Yaofeng Tu, Jiajun Chen.
CCMT 2019. Best English Paper Award

Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation.
Huiyun Yang, Shujian Huang*, Xinyu Dai, Jiajun Chen.
EMNLP-IJCNLP 2019.

Dynamic Past and Future for Neural Machine Translation.
Zaixiang Zheng, Shujian Huang*, Zhaopeng Tu, Xin-Yu Dai, Jiajun Chen.
EMNLP-IJCNLP 2019.

Learning Representation Mapping for Relation Detection in Knowledge Base Question Answering.
Peng Wu, Shujian Huang*, Rongxiang Weng, Zaixiang Zheng, Jianbing Zhang, Xiaohui Yan and Jiajun Chen.
ACL 2019.

Generating Sentences from Disentangled Syntactic and Semantic Spaces.
Yu Bao, Hao Zhou, Shujian Huang*, Lei Li, Lili Mou, Olga Vechtomova, XIN-YU DAI and Jiajun CHEN.
ACL 2019.

Utilizing Non-Parallel Text for Style Transfer by Making Partial Comparisons.
Di Yin, Shujian Huang*, Xin-Yu Dai and Jiajun Chen.
IJCAI 2019.

Correct-and-Memorize: Learning to Translate from Interactive Revisions.
Rongxiang Weng, Hao Zhou, Shujian Huang*, Lei Li, Yifan Xia and Jiajun Chen.
IJCAI 2019.

Online Distilling from Checkpoints for Neural Machine Translation.
Hao-Ran Wei, Shujian Huang*, Boxing Chen, Ran Wang, XIN-YU DAI and Jiajun CHEN.
NAACL-HLT 2019.

2018

Modeling Past and Future for Neural Machine Translation.
Zaixiang Zheng, Hao Zhou, Shujian Huang*, Lili Mou, Xinyu Dai, Jiajun Chen, and Zhaopeng Tu.
TACL 2018.

Combining character and word information in neural machine translationusing a multi-level attention.
Huadong Chen, Shujian Huang*, David Chiang, Xinyu Dai, and Jiajun Chen.
NAACL 2018.

Learning to Discriminate Noises for Incorporating External Information in Neural Machine Translation.
Zaixiang Zheng, Shujian Huang*, Zewei Sun, Rongxiang Weng, Xin-Yu Dai, Jiajun Chen.
arxiv:1810.10317, 2018.

Controlling the Transition of Hidden States for Neural Machine Translation.
Zaixiang Zheng, Shujian Huang*, Xin-Yu Dai, Jiajun Chen.
CWMT 2018.

2017

Rgraph:Generating reference graphs for better machine translation evaluation.
Hongjie Ji, Shujian Huang*, Qi Hou, Cunyan Yin, and Jiajun Chen.
CWMT 2017.

Compressing neural networks byapplying frequent item-set mining.
Zi-Yi Dou, Shu-Jian Huang*, and Yi-Fan Su.
ICANN 2017.

Neural Machine Translation with Word Predictions.
Rongxiang Weng, Shujian Huang*, Zaixiang Zheng, Xinyu Dai and Jiajun Chen.
EMNLP 2017.

Top-rank Enhanced Listwise Optimization for Statistical Machine Translation.
Huadong Chen, Shujian Huang*, David Chiang, XIN-YU DAI and Jiajun CHEN.
CoNLL 2017.

AGRA: An Analysis-Generation-Ranking Framework for Automatic Abbreviation from Paper Titles.
Jianbing Zhang, Yixin Sun, Shujian Huang*, Cam-Tu Nguyen, Xiaoliang Wang, Xinyu Dai, Jiajun Chen, Yang Yu.
IJCAI 2017.

Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder.
Huadong Chen, Shujian Huang*, David Chiang, Jiajun Chen.
ACL 2017.

Chunk-based Bi-Scale Decoder for Neural Machine Translation.
Hao Zhou, Zhaopeng Tu, Shujian Huang, Xiaohua Liu, Hang Li and Jiajun Chen.
ACL 2017. (short paper)

A Neural Probabilistic Structured-Prediction Method for Transition-Based Natural Language Processing.
Hao Zhou, Yue Zhang*, Chuan Chen, Shujian Huang*, Xin-Yu Dai, and Jiajun Chen.
JAIR 2017.

2016

A Search-Based Dynamic Reranking Model for Dependency Parsing.
Hao Zhou, Yue Zhang, Shujian Huang, Junsheng Zhou, XIN-YU DAI and Jiajun Chen.
ACL 2016.

Tree-state based Rule Selection Models for Hierarchical Phrase-based Machine Translation.
Shujian Huang, Huifeng Sun, Chengqi Zhao, Jinsong Su, Xinyu DAI and Jiajun Chen.
IJCAI 2016.

PRIMT: A Pick-Revise Framework for Interactive Machine Translation.
Shanbo Cheng, Shujian Huang*, Huadong Chen, Xinyu DAI and Jiajun Chen.
NAACL-HLT 2016.

Evaluating a Deterministic Shift-Reduce Neural Parser for Constituent Parsing.
Hao Zhou, Yue Zhang, Shujian Huang, Xin-Yu Dai, and Jiajun Chen.
LREC 2016.

Adaptation of Language Models for SMT Using Neural Networks with Topic Information.
Yinggong Zhao, Shujian Huang*, Xinyu Dai, and Jiajun Chen.
ACM TALLIP 2016.

Enhancing Shift-Reduce Constituent Parsing with Action N-Gram Model.
Hao Zhou, Shujian Huang*, Junsheng Zhou, Yue Zhang, Huadong Chen, Xinyu Dai, Chuan Cheng, Jiajun Chen.
ACM TALLIP 2016.

2015

Resolving Coordinate Structures for Chinese Constituent Parsing.
Yichu Zhou, Shujian Huang*, Xinyu Dai, Jiajun Chen.
NLPCC 2015.

Non-linear Learning for Statistical Machine Translation.
Shujian Huang, Huadong Chen, Xinyu Dai, Jiajun Chen.
ACL 2015.

A Neural Probabilistic Structured-Prediction Model for Transition-Based Dependency Parsing.
Hao Zhou, Yue Zhang, Shujian Huang, Jiajun Chen.
ACL 2015.

2014

Learning Word Embeddings from Dependency Relations.
Yinggong Zhao, Shujian Huang, Xinyu Dai, Jianbing Zhang, Jiajun Chen.
IALP 2014.

An Investigation on Statistical Machine Translation with Neural Language Models.
Yinggong Zhao, Shujian Huang, Huadong Chen, and Jiajun Chen.
CCL and NLP-NABD 2014.

2013

Hypothesis Pruning in Learning Word Alignment.
HUANG Shujian, DAI Xinyu, CHEN Jiajun.
Chinese Journal of Electronics 2013.

2012

Enhancing Statistical Machine Translation with Character Alignment.
Ning Xi, Guangchao Tang, Xinyu Dai, Shujian Huang, Jiajun Chen.
ACL 2012. (short paper)

2011

Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment.
Shujian Huang, Stephan Vogel and Jiajun Chen.
ACL 2011. (short paper)

back to homepage