Publication:
Thematic patterning in English and Spanish: contrastive annotation of a bilingual newspaper corpus for liguistic and computational applications

Loading...
Thumbnail Image
Official URL
Full text at PDC
Publication Date
2016-10-25
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Universidad Complutense de Madrid
Citations
Google Scholar
Research Projects
Organizational Units
Journal Issue
Abstract
Thematization is recognized as a fundamental phenomenon in the construction of messages and texts by di erent linguistic schools. This location within a text privileges the elements that guide the reader in the orientation and interpretation of discourse at di erent levels. Thematizing a linguistic unit by locating it in the rst-initial position of a clause, paragraph, or text, confers upon it a special status: a signal of the organizational strategy which characterizes di erent text types playing a role as a variable in the distinction of registers, text types and genres. However, in spite of the importance of the study of thematization for message and textual structuring, to date there are no linguistic studies that have undertook the task of validating its aspects in a comparative manner, either for linguistic or computational purposes. This study, therefore, lls a research gap by implementing a methodology based on contrastive corpus annotation, which allows to empirically validate aspects of the phenomenon of Thematization in English and Spanish, it also seeks to develop a bilingual English-Spanish comparable corpus of newspaper texts automatically annotated with thematic features at clausal and discourse levels. The empirically validated categories (Thematic Field and its elements: Textual Theme, Interpersonal Theme, PreHead and Head) are used to annotate a larger corpus of three newspaper genres news reports, editorials and letters to the editor in terms of thematic choices. This characterization, reveals interesting results, such as the use of genre-speci c strategies in thematic position. In addition, the thesis investigates the possibility to automate the annotation of thematic features in the bilingual corpus through the development of a set of JAVA rules implemented in GATE. It also shows the e cacy of this method in comparison with the manual annotation results...
Description
Tesis inédita de la Universidad Complutense de Madrid, Facultad de Filología, Departamento de Filología Inglesa, leída el 04-12-2015
UCM subjects
Unesco subjects
Keywords
Citation
Collections