Linguistics Home     Fall School Home     
  
 
Photos
Jonas Kuhn

New:  Slides available as
PDF-files (updated daily).

Week 1 Week 2
1. 19-09-05 6.   26-09-05
2. 20-09-05 7.   27-09-05
3. 21-09-05 8.   28-09-05
4. 22-09-05 9.   29-09-05
5. 23-09-05 10. 30-09-05

Machine Translation -- Classical and Statistical Approaches
 
Lecturer: 
Jonas Kuhn

Course Description:
The first week of this course covers traditional rule-based machine
translation (MT), as it is underlying in most current commerical MT
systems.  We discuss linguistic translation challenges and the ways in
which they are addressed in various classical MT architectures:
transfer-based MT, interlingual MT, and term-rewriting MT.

In the second week we address the data-driven statistical MT (SMT)
approach, which was first proposed in the 1990's and is receiving more
and more attention in computational linguistics.  We discuss the
components of an SMT system (language model, translation model,
decoder) and more recent developments, e.g., moving from a word-based
approach to a "phrase"-based approach.

Throughout the course, we will look at original research papers.  The
course also puts some emphasis on hands-on experience in MT system
development.  In lab exercises, we develop small MT systems and look
at freely available tools aiding the development process.

Lecture Plan:

Week 1
  • Overview and historical background
  • Transfer-based MT: syntactic transfer
  • Transfer-based MT: transfer by LFG projection
  • Interlingual MT
  • Term-rewriting MT

Week 2
  • Statistical MT: Overview and evaluation methodology
  • Sentence alignment, word alignment, and "phrase"-based word alignment
  • Language models and Decoding
  • Syntax-based statistical MT
  • Other uses of word alignments and wrap-up

Main Texts:
(Copies of these papers will be included in a reader available at the Fall
School!)

Trujillo, Arturo (1999). Translation Engines: Techniques for Machine
Translation. London/Berlin: Springer-Verlag.  (Section 6.1 on
syntactic transfer)

Kaplan, Ronald M.; Klaus Netter; Jürgen Wedekind; and Annie Zaenen
(1999).  Translation by Structural Correspondences.  In: Proceedings
of EACL 1999, 272-281.

Dorr, Bonnie J. (1994). Machine Translation Divergences: A Formal
Description and Solution. In: Computational Linguistics 20(4),
597-633.

Emele, Martin C. and Michael Dorna (1998). Ambiguity Preserving
Machine Translation using Packed Representations.  In: Proceedings of
COLING/ACL 1998.

Brown, Peter F.; John Cocke; Stephen A. Della Pietra; Vincent J. Della
Pietra; Fredrick Jelinek; John D. Lafferty; Robert L. Mercer; and Paul
S. Roossin (1990). A Statistical Approach to Machine Translation.  In:
Computational Linguistics 16(2), 79-85.

Knight, Kevin (1999). A Statistical MT Tutorial Workbook. Unpublished
Manuscript.

Koehn, Philipp; Franz Josef Och; and Daniel Marcu (2003).
Statistical Phrase-Based Translation. In: Proceedings of the 2003
Meeting of the North American chapter of the Association for
Computational Linguistics (NAACL-03), Edmonton, Alberta, 2003.

Koehn, Philipp (2004).  Pharaoh: a Beam Search Decoder for
Phrase-Based Statistical Machine Translation Models.  In: Proceedings
of AMTA 2004.

Yarowsky, David and Grace Ngai (2001). Inducing Multilingual POS
Taggers and NP Bracketers via Robust Projection across Aligned
Corpora.  In: Proceedings of the 2001 Meeting of the North American
chapter of the Association for Computational Linguistics (NAACL-01),
200-207.

Additional readings:
(will be made available in electronic form):

Knight, Kevin (1997). Automating Knowledge Acquisition for Machine
Translation.  In: AI Magazine 18(4), 81-96.

Papineni, Kishore; Salim Roukos; Todd Ward; and Wei-Jing Zhu
(2002). BLEU: a Method for Automatic Evaluation of Machine
Translation.  In: Proceedings of the 40th Annual Conference of the
Association for Computational Linguistics (ACL), 311-318.

Brown, Peter F.; Vincent J. Della Pietra; Stephen A. Della Pietra; and
Robert L. Mercer (1993).  The Mathematics of Statistical Machine
Translation: Parameter Estimation.  In: Computational Linguistics
19(2), 263-311.

Wu, Dekai and Hongsing Wong (1998). Machine translation with a
stochastic grammatical channel. In: Proceedings of COLING-ACL'98.

Yamada, Kenji and Kevin Knight (2001).  A Syntax-Based Statistical
Translation Model. In: Proceedings of the Conference of the
Association for Computational Linguistics (ACL).

Last update: