Document Actions

Pre-conference Workshops

There will be three pre-conference workshops (on Wed 26th May, 2.00 - 6.15 pm). You will find all relevant information (e.g. workshop programmes) on this website.

 



Workshop I: "Investigating Earlier Spoken English. Papers based on the Old Bailey Corpus"
Convenor: Magnus Huber (Justus Liebig University Giessen) (Wed 26th May, 2.00 - 6.15 pm)

(Programme workshop I)

 

This ICAME 2010 pre-conference workshop will focus on the Old Bailey Corpus, which is being compiled at the Department of English in Giessen.

 

The proceedings of the Old Bailey, London's central criminal court, were published from 1674 to 1913 and constitute a large body of texts from the beginning of Present Day English. They contain over 200,000 trials, totalling ca. 134 million words and the verbatim passages are arguably as near as we can get to the spoken word of the period. The material thus offers the rare opportunity of analyzing everyday language in a period that has been neglected both with regard to the compilation of primary linguistic data and the description of the structure, variability, and change of English. The Old Bailey Corpus (OBC) is based on these Proceedings and documents spoken English from the 1720s onward.

For an overview of the corpus see Huber, Magnus. 2007. "The Old Bailey Proceedings, 1674-1834. Evaluating and annotating a corpus of 18th- and 19th-century spoken English". Meurman-Solin, Anneli & Nurmi, Arja (eds.) Annotating Variation and Change (Studies in Variation, Contacts and Change in English 1).

http://www.helsinki.fi/varieng/journal/volumes/01/huber/

For detailed background information on the Old Bailey and the publication history of the Proceedings consult the excellent Old Bailey Proceedings Online at http://www.oldbaileyonline.org

A first version of the OBC was made available to the presenters. This version contains c. 700,000 words per decade (of which ca. 600,000 are direct speech) from 1720 to 1913. Thus, OBC 0.1 contains over 10 million words of spoken English, and every single utterance is XML-tagged and annotated with the following socio-biographical speaker attributes, where available:

  • sex
  • age
  • role in the courtroom (witness, defendant, plaintiff, lawyer, etc.)
  • occupation according to HISCO (Historical International Standard Classification of Occupations)
  • crime scene
  • workplace
  • place of residence

Additional attributes identify the scribe, who took down the court proceedings in shorthand, and the respective publisher of the Proceedings. This makes it possible to investigate the influence of scribal idiosyncrasies or the publisher's house style on the representation of spoken language.

I am looking forward to seeing you at the workshop,

Magnus Huber

magnus.huber@anglistik.uni-giessen.de


Workshop II: “News, (new) media, and corpora: from methodology to theory”
Convenor: Roberta Facchinetti (University of Verona) (Wed 26th May, 2.30 - 6.15 pm)

(Programme workshop II)

The focus of this pre-conference workshop will be on a set of topical issues pertaining to corpus-based studies on the language of news, including both printed and broadcast news.
So far, corpus studies in this context have focused mainly on written media and on the language of newspapers in particular. While not disregarding these studies, the workshop is intended to also address the interplay of different media in the actualization of news on television, radio and on the Internet. For example, we will also look at blogs, podcasts, vodcasts, and video sharing from a corpus-linguistic perspective.


Papers (20 mins + 10 mins discussion) are mainly focusing on methodological and theoretical issues concerning two lead questions:

  • How can corpora and corpus-linguistic methods be applied to the study of news in old and new media, including the wide range of Internet-based communication?
  • How do corpora and corpus-linguistic methods have to change to come to grips with this new multi-modal scenario?

The deadline for the submission of abstracts (max 400 words) for the “News and Media” workshop was 20 December 2009.

I am looking forward to seeing you at the workshop.
Roberta Facchinetti

roberta.facchinetti@univr.it


Workshop III: "Corpus Linguistics on the Web: Introducing the WebCorp Linguist's Search Engine"
Convenor: Antoinette Renouf (Birmingham City University) (Wed 26th May, 2.00 - 6.15 pm)

(Programme workshop III)

This ICAME 2010 pre-conference workshop will introduce the WebCorp Linguist’s Search Engine (WebCorpLSE) and the new possibilities it opens up for web-scale corpus-based study.

The current publicly-available version of WebCorp was first launched a decade ago (http://www.webcorp.org.uk). This system relies on standard web search engines such as Google, adding layers of refinement specifically for linguistic analysis.

WebCorpLSE is designed to bypass the commercial search engines upon which WebCorp relied as gatekeepers to the web. WebCorpLSE is crawling and processing the web to build a 10 billion word (7 terabyte) corpus, including a multi-terabyte ‘mini-web’, designed to act as a microcosm of the web itself. In addition to the mini-web, WebCorpLSE has built a newspaper sub-corpus, containing daily issues of UK broadsheets from 1984-present and recent issues of other UK and international newspapers. We have also worked with our university colleagues to build collections to assist in their research and teaching, including sub-corpora of blogs, science fiction and major English literary works.

The new architecture has allowed us to enhance the sentence boundary detection, date identification, 'junk' (or 'boilerplate') removal, collocation and other statistical analysis options currently available in WebCorp. Additional pre-processing includes grammatical tagging and language detection, and full pattern matching and wildcard search.

In this workshop, the developers of WebCorpLSE will first introduce its new features and demonstrate how these can be used. There will then be papers from other contributors on its various applications. The contribution of papers to this workshop is by invitation only, though all ICAME delegates will be free to attend.

We look forward to seeing you at the workshop.
Antoinette Renouf, Andrew Kehoe & Matt Gee