???????
? ?
??????????? lee5110@263.net
???????
????????????????????Translational English Corpus????????????parallel corpus????????multilingual corpus????????comparable corpus??????????????????????????????????????????????????co-occurrence??????????????????????????????????????????????????????????????????
1) ????????????12???
2) ?????????????24???
3) ???????????????24???
4) ???????????????????????48???
????H.J.Vermeer????????????????????????????????????Baker, 1995:238?????????????????????????????????????????????simplification??????explication??????conventionalization?????????????????data-driven??????????????????????????????????????????????????????????????????????????
???????????? . ????????? . ?????2001 (5).?
??????????????????????????????????????????????????????????????????????????Translational English Corpus????????????????????????????????????????????2001????????????2000??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
----???. ???????????? ????????.???????????2002?3?
???????????CTIS?????????????????????---????????TEC?????????????????????????????????????????
??????????????????????????????2002(6)?
What is a corpus
A corpus is a collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language.
A computer corpus is a corpus which is encoded in a standardised and homogenous way for open-ended retrieval tasks. Its constituent pieces of language are documented as to their origins and provenance. Eagles Preliminary Recommendations on Corpus Typology
What is TEC?
TEC is a computerised collection of contemporary translational English text. It is freely available to the research community, with a set of software tools to allow scholars to investigate the language of translated English. The corpus is continually being enlarged and the software tools refined and made more versatile and user-friendly.
TEC is a corpus of contemporary translational English: it consists of written texts translated into English from a variety of source languages, European and non-European. It was set up and is currently managed by Professor Mona Baker at the Centre for Translation & Intercultural Studies. The custom-made software for processing the corpus, which is downloadable from the web, is designed by Dr. Saturnino Luz, Trinity College Dublin, who is also in charge of maintaining the corpus.
What does TEC consist of?
TEC consists of four subcorpora: fiction, biography, news and inflight magazines. The overall size of the corpus is currently (2003) around 10 million words. It can be accessed freely via the web, using a custom-built concordancer designed by Dr. Saturnino Luz.
TEC is meticulously documented in terms of extralinguistic features such as gender, nationality and occupation of the translator, direction of translation, source language, publisher of the translated text, etc. This information is held in a separate header file for each text. The concordancing software is designed to make the information in the header file available to the researcher at a glance
What type of research does TEC support?
TEC has supported a broad range of studies in two main areas: the way in which the patterning of translated text might be different from that of non-translated text in the same language, and stylistic variation across individual translators. Examples of both types of study can be found in the Selected Bibliography attached to this document.
TEC files
1. Subcorpus: Inflight magazines
Lufthansa Bordbuch and Blue Wings
2. Subcorpus: Newspapers
The Guardian and The European
3. Subcorpus: Biography
4. Subcorpus: Fiction
Sample Header File
TITLE :
Filename: fn000009.txt
Subcorpus: Fiction
Collection: Memoirs of Leticia Valle
TRANSLATOR
Name: Carol Maier
Gender: female
Nationality: American
Employment: Lecturer
TRANSLATION
Mode: written
Extent: 55179
Publisher:
Place:
Date: 1994
Copyright:
Comments: Title in European Women Writers Series
AUTHOR
Name: Rosa Chacel
Gender: female
Nationality: Spanish
SOURCE TEXT
Language: Spanish
Mode: written
Status: original
Place:
Date: 1945
Basic Methodologies
Comparable: two corpora in the same language, one consisting of translated and the other of non-translated texts;
Parallel: corpora of source texts and their translations;
Multilingual: corpora of non-translated texts in two or more languages, from the same domain, time period, etc.
Examples of corpora studies
Comparable corpora (corpora of translated and non-translated texts in the same language, and in similar domains) e.g. Olohan & Baker (based on TEC and BNC)
Parallel corpora (corpora of source texts and their single or multiple translations) e.g. Bosseaux (the Waves + 2 French translations )
Parallel corpora, with monolingual reference corpus in the language of the translated subcorpus e.g. Wallace (ECPC; IT & popular science English texts and two sets of Chinese translations, plus SINICA Chinese reference corpus)
Parallel corpora, with a monolingual reference corpus in each language (translated and non-translated) e.g. kenny (GEPCOLT; experimental German literary texts and single English translations, plus BNC & Mannheim Reference Corpora)
Features Investigated (translated vs. non-translated texts)
Broad features (universals?): explicitation, simplification, normalisation, levelling out
Specific features (syntactic, lexical, literary): zero/that variation; contractions; split infinitives; use of idioms; recurrent lexical patterns; reformulation markers; marked collocations; point of view; deixis, etc. (Mona Baker)
Laviosa?1998b??????????????????????????????????????????????????????????????????????????????????????????????????????????????????Baker 1993, 1998?????????????Ψverεs, 1998?????????????????Kenny 1998??????????????????????????????????????????????Frawley?1984?????????????????????????????????? (Zhonghua Xiao)
????????????????????
1. ??????????????????????????????????????????
2. ??????????????????????????????????????
3. ????????????????????
4. ???????????????????????-----Aijmer & Altenberg?1996: 12?Cited in Zhonghua Xiao
??-???????
http://icl.pku.edu.cn/project/parallel/default.htm ???????????
http://www.ling.lancs.ac.uk/corplang/babel/babel.htm
(The
????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????150??/??170??/??100??/??130??/??????????40%??????60%????????????55%?45%?????? 2003??
???????
??BNC????????????BNC???http://info.ox.ac.uk/bnc/index.html ???????ftp?ftp://sable.ox.ac.uk/pub/ota/BNC/SARA/?????????????SARA??????????20??
COBUILD?????????????????COBUILD?????????http://titania.cobuild.collins.co.uk/index.html???????????????????????????50??
TeCCon??????????????????????Manchester?????http://www.art.man.ac.uk/SML/ctis/research/tec.htm ???????????????????????????????????????????????????????????????????????????????????????TEC Browser?http://ronaldo.cs.tcd.ie/tec/jnlp/ ????????????????????
References
??. ?????????????????????????2002(6).
????????????????????, 2000 (5).
??? . ????????? . ????, 2001 (5).
???. ????????????????2000?2003?
???. ????????????????????,??????,2003 (1).
???. ???????????? ????????. ?????????, 2002 (3).
Baker, Mona. Corpus-based Translation Studies (Lecture Handout), 2004.
http://www.monabaker.com/tsresources/TranslationalEnglishCorpus.htm
http://www.art.man.ac.uk/SML/ctis/research/tec.htm