There will be three pre-conference workshops (on Wed 26th May, 2.00 - 6.15 pm). You will find all relevant information (e.g. workshop programmes) on this website.
Workshop I: "Investigating Earlier Spoken English. Papers based on the Old Bailey Corpus"
(Programme workshop I)
Convenor: Magnus Huber (Justus Liebig University Giessen) (Wed 26th May, 2.00 - 6.15 pm)
This ICAME 2010 pre-conference
workshop will focus on the Old Bailey Corpus, which is being compiled at the Department
of English in Giessen.
The proceedings of the
Old Bailey, London's central criminal court, were published from 1674 to 1913
and constitute a large body of texts from the beginning of Present Day English.
They contain over 200,000 trials, totalling ca. 134 million words and the verbatim
passages are arguably as near as we can get to the spoken word of the period.
The material thus offers the rare opportunity of analyzing everyday language
in a period that has been neglected both with regard to the compilation of primary
linguistic data and the description of the structure, variability, and change
of English. The Old Bailey Corpus (OBC) is based on these Proceedings and documents
spoken English from the 1720s onward.
For an overview of the corpus
see Huber, Magnus. 2007. "The Old Bailey Proceedings, 1674-1834. Evaluating
and annotating a corpus of 18th- and 19th-century spoken English". Meurman-Solin,
Anneli & Nurmi, Arja (eds.) Annotating Variation and Change (Studies in Variation,
Contacts and Change in English 1).
For detailed background
information on the Old Bailey and the publication history of the Proceedings
consult the excellent
Old Bailey Proceedings Online at http://www.oldbaileyonline.org
A first version of the OBC was made available to the presenters. This version contains c. 700,000 words
per decade (of which ca. 600,000 are direct speech) from 1720 to 1913. Thus,
OBC 0.1 contains over 10 million words of spoken English, and every single
utterance is XML-tagged and annotated with the following socio-biographical
speaker attributes, where available:
- role in the courtroom
(witness, defendant, plaintiff, lawyer, etc.)
- occupation according
to HISCO (Historical International Standard Classification of Occupations)
- crime scene
- place of residence
Additional attributes identify
the scribe, who took down the court proceedings in shorthand, and the respective
publisher of the Proceedings. This makes it possible to investigate the influence
of scribal idiosyncrasies or the publisher's house style on the representation
of spoken language.
I am looking forward to
seeing you at the workshop,
(Programme workshop II)
The focus of this pre-conference workshop will be on a set of topical issues
pertaining to corpus-based studies on the language of news, including both printed
and broadcast news.
So far, corpus studies in this context have focused mainly on written media
and on the language of newspapers in particular. While not disregarding these
studies, the workshop is intended to also address the interplay of different
media in the actualization of news on television, radio and on the Internet.
For example, we will also look at blogs, podcasts, vodcasts, and video sharing from a corpus-linguistic
Papers (20 mins + 10 mins discussion) are mainly focusing on methodological
and theoretical issues concerning two lead questions:
- How can corpora and
corpus-linguistic methods be applied to the study of news in old and new media,
including the wide range of Internet-based communication?
- How do corpora and
corpus-linguistic methods have to change to come to grips with this new multi-modal
The deadline for the submission
of abstracts (max 400 words) for the “News and Media” workshop was
20 December 2009.
I am looking forward to
seeing you at the workshop.
(Programme workshop III)
ICAME 2010 pre-conference workshop will introduce the WebCorp Linguist’s Search
Engine (WebCorpLSE) and the new possibilities it opens up for web-scale
current publicly-available version of WebCorp was first launched a decade ago (http://www.webcorp.org.uk). This system
relies on standard web search engines such as Google, adding layers of
refinement specifically for linguistic analysis.
is designed to bypass the commercial search engines upon which WebCorp relied
as gatekeepers to the web. WebCorpLSE is crawling and processing the web to
build a 10 billion word (7 terabyte) corpus, including a multi-terabyte
‘mini-web’, designed to act as a microcosm of the web itself. In addition to
the mini-web, WebCorpLSE has built a newspaper sub-corpus, containing daily
issues of UK broadsheets
from 1984-present and recent issues of other UK and
international newspapers. We have also worked with our university colleagues to
build collections to assist in their research and teaching, including
sub-corpora of blogs, science fiction and major English literary works.
new architecture has allowed us to enhance the sentence boundary detection,
date identification, 'junk' (or 'boilerplate') removal, collocation and other
statistical analysis options currently available in WebCorp. Additional pre-processing
includes grammatical tagging and language detection, and full pattern matching
and wildcard search.
In this workshop, the
developers of WebCorpLSE will first introduce its new features and demonstrate
how these can be used. There will then be papers from other contributors on its
various applications. The contribution of papers to this workshop is by
invitation only, though all ICAME delegates will be free to attend.
We look forward to seeing you at the workshop.
Antoinette Renouf, Andrew Kehoe & Matt Gee