Document Actions

Completed Research Projects

Fluency in ENL, ESL and EFL: A contrastive corpus-based study of English as a first, second, and foreign language [2017-2020]

This project (funded by the German Research Foundation: DFG GO 1760/4-1 & WO 2224/1-1) takes a holistic approach to investigate the fluency of speakers of English as a first language (ENL), a second language (ESL) and as a foreign language (EFL). As a database for the corpus analysis, we will use several components of the International Corpus of English (ICE; Nelson 1996) and the Louvain International Database of Spoken English Interlanguage (LINDSEI; Gilquin et al. 2010). In order to systematically assess how speakers of these three different types of Englishes establish fluency, these corpora will be analyzed for the linguistic variables that can potentially have an effect on a speaker’s fluency (i.e. ‘fluencemes’; Götz 2013: 8-9). For this purpose, an integrated, fluenceme-based taxonomy will be used to make it possible to describe fluency on three different levels: (1) the temporal variables in speech production (e.g. the length of runs, pause ratio, speech rate, etc.) as well as speakers’ use of fluency-enhancing strategies (e.g. discourse markers or prefabricated units), (2) the (potential combinations of) different/several fluencemes to overcome planning difficulties (fluenceme chunking) as well as their positions in the utterance (fluenceme positioning), and (3) correlations of fluencemes with extralinguistic variables (e.g. gender, age) that can predict the type and use of fluencemes in different types of Englishes (fluenceme preferencing).

For further information, please visit the project homepage on the DFG's GEPRIS website.

Pragmatic nativisation in spoken Sri Lankan English: a corpus-based study [2017-2020]

This research project (funded by the German Research Foundation: DFG BE 5812/2-1) aims at the empirical description of pragmatic routines in spoken Sri Lankan English. It thus advances our understanding of spoken Sri Lankan English, which is also of relevance for other postcolonial Englishes. In comparison to British English, the historical ancestor of Sri Lankan English, and Indian English, the largest variety of English in South Asia and a direct geographical neighbour to Sri Lankan English, this project offers corpus-based descriptions of pragmatic nativisation of spoken Sri Lankan English with regard to organisational and actional pragmatic routines under consideration of sociobiographic speaker information. To ground said analyses in a valid database, the first representative corpus of spoken Sri Lankan English comprising many contexts of use from informal chats between friends to formal political speeches is completed and published as the Sri Lankan component of the International Corpus of English.

Please visit the project homepage on the DFG's GEPRIS website.

International Corpus of English: ICE-Sri Lanka

The Sri Lankan component of the International Corpus of English is based at the University of Giessen. The written component of the ICE-SL corpus is now available, in SGML format and in a POS-tagged version.

Please visit the official ICE websites of ICE-Sri Lanka.

Transforming European Learner Language into Learning Opportunities (TELL-OP) [2014-2017]

How can adult learners use their own output to further acquire language skills? How can adult learners use their critical thinking, analysis and awareness skills to improve their communicative competence across different CEFR levels?

TELL-OP is an ERASMUS+ strategic partnership that seeks to bring together the knowledge and expertise of European stakeholders from Belgium, Germany, Spain, Turkey and the UK in the fields of language education, corpus and applied linguistics, e-learning and knowledge engineering in order to promote the personalized e-learning of languages in the contexts of higher and adult education, in particular, through mobile devices.

Please visit the project website.

Academic Trans- and Multiliteracy and the Challenges of English-Medium Instruction (EMI)

The PORTT research group at the Department of English and the Centre for Competence Development (ZfbK) of Justus Liebig University, Giessen/Germany, investigates the development of academic literacy, transliteracy and multiliteracy in different educational contexts. Whereas the term ‘literacy’ refers to the ability to read texts, to compose texts and to learn from textual material in one language, the concept of ‘transliteracy’ (Gentil 2005) takes into account that, in academic writing, which is always a material-based process, the language(s) of the material drawn on may differ from that of texts to be composed. Transliteracy requires translation competence in a functionalist sense. ‘Multiliteracy’ encompasses transliteracy and refers to (full) literacy in more than one language. To gain insight into the development of these forms of literacy, the members of the PORTT research group investigate cognitive processes of text reception, text production and translation using empirical methods such as think aloud, keystroke logging, screen recording and eye tracking. Its research focuses on the development of L1 (German) and L2 (English) academic writing skills and their interdependence, as well as translation competence development from the novice stage up to expert performance. Findings are being used to continuously optimize the teaching of these competences in various disciplines and degree programs and to develop best-practice approaches to be adopted in forms of English-medium instruction which make full use of students’ entire language resources.

Gentil, Guillaume (2005): “Commitments to academic biliteracy: Case studies of anglophone university writers.”
        Written Communication 22.4 (2005): 421–471.

The Old Bailey Corpus - Spoken English in the 18th and 19th centuries [2008-2012]

The proceedings of the Old Bailey, London's central criminal court, were published from 1674 to 1913 and constitute a large body of texts from the beginning of Late Modern English. The Proceedings contain over 200,000 trials, totalling ca. 134 million words and its verbatim passages are arguably as near as we can get to the spoken word of the period. The material thus offers the rare opportunity of analyzing everyday language in a period that has been neglected both with regard to the compilation of primary linguistic data and the description of the structure, variability, and change of English. The Old Bailey Corpus (OBC) is based on the Proceedings and documents spoken English from the 1720s onward.

Please visit the project website of the Old Bailey Corpus or the Old Bailey Proceedings Online.

Integrating the Old Bailey Corpus into the CLARIN-D infrastructure [2015-2016]

The Old Bailey Corpus 1720-1913 (OBC) is a corpus of 18th and 19th century spoken English and consists of selected Proceedings of the Old Bailey, London's central criminal court. The OBC currently has c. 750,000 words of direct speech per decade between the years 1720-1913, amounting to about 13.9 million words of spoken English. Every speaker turn is annotated for sociobiographical (gender, social class, age), pragmatic (role in the court proceeding) and textual variables (the shorthand scribe, printer and publisher of individual Proceedings). The aim of this project is to integrate the OBC into the German section of the Common Language Resources and Technology Infrastructure (CLARIN-D) to achieve sustainability of this resource (persistent storage and access). The project is funded by the Federal Ministry of Education and Research. The OBC will be hosted at the CLARIN-D Service Centre of Saarland University.

Verb Complementation in South Asian Englishes [2008-2011]

The project aims to take into account all complementation patterns that ditransitive verbs such as to give can take in order to evaluate whether, and to which extent, differences in verb complementation between South Asian Englishes and British English are dependent on various lexicogrammatical factors (e.g. variation in collocational profiles; pronominality, animacy, or syntactic complexity of the arguments; influences from indigenous languages). More Information can be found on the project's website in the DFG's research database.


TransComp is a process-oriented longitudinal study which explores the development of translation competence in 12 students of translation over a period of 3 years and compares it to that of 10 professional translators. It aims to make an important contribution to the development of the methodology and model building in process-oriented translation studies by overcoming a number of shortcomings of previous studies. The insight into the components which make up translation competence and into its development gained in the project will be utilized for translation pedagogy and the improvement of curricula for translator training.
Please visit the project website.

The Automated Similarity Judgment Program

The ASJP project aims at achieving a computerized lexicostatistical analysis of ideally all the world’s languages. The two main purposes are to provide a classification of all languages by a single, consistent and objective (if perhaps not ideal) method and to perform various statistical analyses regarding the historical and areal behavior of lexical items.

It is a computerized lexicostatistical analysis in that it uses an algorithm taken from biogenetics in order to arrive at an unbiased classification of the languages of the world. The data input is computer readable transcriptions of the 40 most stable items of the Swadesh word list. At present the database consists of more than 5300 languages.

Please visit the project website.

You can still contribute to the ASJP project: wordlists for ca. 3000 languages are still missing.