There will be three pre-conference workshops (on Wed 26th May, 2.00 - 6.15 pm). You will find all relevant information (e.g. workshop programmes) on this website.
- Workshop I: "Investigating Earlier Spoken English. Papers based on the Old Bailey Corpus"
(Programme workshop I)
- Workshop III: "Corpus Linguistics on the Web: Introducing the WebCorp Linguist’s Search Engine"
(Programme workshop III)
Workshop I: "Investigating Earlier Spoken English. Papers based on the Old Bailey Corpus"
(Programme workshop I)
Convenor: Magnus Huber (Justus Liebig University Giessen) (Wed 26th May, 2.00 - 6.15 pm)
This ICAME 2010 pre-conference workshop will focus on the Old Bailey Corpus, which is being compiled at the Department of English in Giessen.
The proceedings of the
Old Bailey, London's central criminal court, were published from 1674 to 1913
and constitute a large body of texts from the beginning of Present Day English.
They contain over 200,000 trials, totalling ca. 134 million words and the verbatim
passages are arguably as near as we can get to the spoken word of the period.
The material thus offers the rare opportunity of analyzing everyday language
in a period that has been neglected both with regard to the compilation of primary
linguistic data and the description of the structure, variability, and change
of English. The Old Bailey Corpus (OBC) is based on these Proceedings and documents
spoken English from the 1720s onward.
For an overview of the corpus see Huber, Magnus. 2007. "The Old Bailey Proceedings, 1674-1834. Evaluating and annotating a corpus of 18th- and 19th-century spoken English". Meurman-Solin, Anneli & Nurmi, Arja (eds.) Annotating Variation and Change (Studies in Variation, Contacts and Change in English 1).
For detailed background
information on the Old Bailey and the publication history of the Proceedings
consult the excellent
Old Bailey Proceedings Online at http://www.oldbaileyonline.org
A first version of the OBC was made available to the presenters. This version contains c. 700,000 words per decade (of which ca. 600,000 are direct speech) from 1720 to 1913. Thus, OBC 0.1 contains over 10 million words of spoken English, and every single utterance is XML-tagged and annotated with the following socio-biographical speaker attributes, where available:
- role in the courtroom (witness, defendant, plaintiff, lawyer, etc.)
- occupation according to HISCO (Historical International Standard Classification of Occupations)
- crime scene
- place of residence
Additional attributes identify the scribe, who took down the court proceedings in shorthand, and the respective publisher of the Proceedings. This makes it possible to investigate the influence of scribal idiosyncrasies or the publisher's house style on the representation of spoken language.
I am looking forward to seeing you at the workshop,
“News, (new) media, and corpora: from methodology to theory”
Convenor: Roberta Facchinetti (University of Verona) (Wed 26th May, 2.30 - 6.15 pm)
The focus of this pre-conference workshop will be on a set of topical issues
pertaining to corpus-based studies on the language of news, including both printed
and broadcast news.
So far, corpus studies in this context have focused mainly on written media and on the language of newspapers in particular. While not disregarding these studies, the workshop is intended to also address the interplay of different media in the actualization of news on television, radio and on the Internet. For example, we will also look at blogs, podcasts, vodcasts, and video sharing from a corpus-linguistic perspective.
Papers (20 mins + 10 mins discussion) are mainly focusing on methodological and theoretical issues concerning two lead questions:
- How can corpora and corpus-linguistic methods be applied to the study of news in old and new media, including the wide range of Internet-based communication?
- How do corpora and corpus-linguistic methods have to change to come to grips with this new multi-modal scenario?
The deadline for the submission of abstracts (max 400 words) for the “News and Media” workshop was 20 December 2009.
I am looking forward to
seeing you at the workshop.
Workshop III: "Corpus Linguistics on the Web: Introducing the WebCorp Linguist's Search Engine"
Convenor: Antoinette Renouf (Birmingham City University) (Wed 26th May, 2.00 - 6.15 pm)
This ICAME 2010 pre-conference workshop will introduce the WebCorp Linguist’s Search Engine (WebCorpLSE) and the new possibilities it opens up for web-scale corpus-based study.
The current publicly-available version of WebCorp was first launched a decade ago (http://www.webcorp.org.uk). This system relies on standard web search engines such as Google, adding layers of refinement specifically for linguistic analysis.
WebCorpLSE is designed to bypass the commercial search engines upon which WebCorp relied as gatekeepers to the web. WebCorpLSE is crawling and processing the web to build a 10 billion word (7 terabyte) corpus, including a multi-terabyte ‘mini-web’, designed to act as a microcosm of the web itself. In addition to the mini-web, WebCorpLSE has built a newspaper sub-corpus, containing daily issues of UK broadsheets from 1984-present and recent issues of other UK and international newspapers. We have also worked with our university colleagues to build collections to assist in their research and teaching, including sub-corpora of blogs, science fiction and major English literary works.
The new architecture has allowed us to enhance the sentence boundary detection, date identification, 'junk' (or 'boilerplate') removal, collocation and other statistical analysis options currently available in WebCorp. Additional pre-processing includes grammatical tagging and language detection, and full pattern matching and wildcard search.
In this workshop, the
developers of WebCorpLSE will first introduce its new features and demonstrate
how these can be used. There will then be papers from other contributors on its
various applications. The contribution of papers to this workshop is by
invitation only, though all ICAME delegates will be free to attend.
We look forward to seeing you at the workshop.
Antoinette Renouf, Andrew Kehoe & Matt Gee