The following speakers have agreed to give plenaries at the conference:
- Edgar W. Schneider (University of Regensburg), Thu 27 May, 9-10 am:
"Tracking the evolution of vernaculars: Corpus linguistics and Earlier Southern US Englishes"
Given that, as Kretzschmar (2009) recently
posited, language is a complex self-organizing system, it is widely
accepted that mechanisms of language change become most directly
evident by observing change in vernacular rather than standard
varieties (and documents, for that matter). Corpus linguists have
worked towards identifying and computerizing historical sources which
come close to representing natural speech (e.g. Nevalainen &
Raumolin-Brunberg 1996; Nurmi et al. 2009; Huber 2007), but clearly a
number of problems remain, and little energy has been devoted to
building historical corpora of dialects in the narrow sense. In this
paper I address these issues from a general perspective and apply the
methodology to corpus-based investigations of earlier black and white
dialects from the Southern United States.
In a first part I
address some theoretical and methodological issues involved in the
attempt at tracking vernacular speech forms in electronic corpora,
paying special emphasis to semi-literate writings as evidence. These
involve data identification and assessment in terms of validity and
reliability, steps and decisions in corpus compilation, and factors
which appear to restrict the value of the corpus-driven approach, such
as missing data, orthographic variability, the problem of zero forms,
and so on. I then introduce the object domain investigated here,
debates on the evolution of both African-American and white dialects in
the southern United States. Widely discussed issues in this context
include the British or African / creole origins of African American
Vernacular English (AAVE), the question of whether black and white
dialects have been diverging from each other over the last century or
so, and Guy Bailey's controversial claim that white southern speech,
the best known and most highly stigmatized non-ethnic dialect of
American English, originated only after the Civil War, in the
Reconstruction period (i.e., much later than originally suspected).
My evidence builds upon three pertinent corpora the compilation of
which I have directed at the University of Regensburg over the last
decade, and each of these will be introduced and briefly characterized
in turn: SPOC, the "Southern Plantation Overseers' Corpus", which
consists of 537 letters written by 55 white plantation overseers
between 1794 and 1876 (Schneider & Montgomery 2001); the BLUR
("Blues Lyrics compiled at the University of Regensburg") corpus, which
consists of 7431 transcribed lyrics of blues songs recorded early in
the 20th century, a total of roughly 1.5 mio. words; and COAAL, a
"Corpus of Older African American Letters", which is being compiled and
completed now, and consists of ca. 1300 letters written by
semi-literate African American writers in the 18th and 19th centuries.
The three types of sources will be compared to each other and assessed
with respect to their methodological potential and limitations.
Finally, I will employ the three corpora in a comparative analysis of
select interesting phenomena of the southern vernaculars, especially
from the domain of verbal morphology.
Huber, Magnus. 2007. "The Old Bailey Proceedings,
1674-1834. Evaluating and annotating a corpus of 18th- and 19th-century
spoken English." In Anneli Meurman-Solin and Arja Nurmi, eds., Annotating Variation and Change.
e-series "varieng – studies in variation, contact and change in
Kretzschmar, William R., Jr. 2009. The Linguistics of Speech. Cambridge: Cambridge University Press.
Nevalainen, Terttu, and Helena Raumolin-Brunberg, eds. 1996. Sociolinguistics and Language History. Studies Based on the Corpus of Early English Correspondence. Amsterdam: Rodopi.
Nurmi, Arja, Minna Nevala, and Minna Palander-Collin, eds. 2009. The Language of Daily Life in England (1400-1800). Amsterdam, Philadelphia: Benjamins.
Edgar W., and Michael Montgomery. 2001. "On the trail of early
nonstandard grammar: An electronic corpus of Southern U.S. antebellum
overseers letters." American Speech 76,. 388-410.
Miriam Meyerhoff (University of Edinburgh), Thu 27 May, 5.30-6.30 pm:
"Finding your mark: Uncovering hidden constraints in corpora"
The study of post-colonial Englishes presents some
empirical challenges for corpus linguists. It is well-known that
English takes on local colour wherever it comes into contact with other
languages. But to study this properly, two kinds of corpora are needed:
solid corpora of English in its post-colonial context, and usable
corpora of the languages it has come into context with. Both kinds of
corpora are often in short supply, hampering our understanding of how
linguistic features are mapped between languages. The consequences of
this are unfortunate: researchers may generalise from limited samples
or from the intuitions of a few informants. With creole varieties of
English, this problem is perhaps most acute: English is generally in
contact with lesser known (possibly now endangered) varieties that are
not thoroughly documented.
But all is not lost. In this talk, I
will suggest that it is possible to use relatively modest corpora of
lesser known languages to explore the complexity and dynamics of
contact-induced variation and change. I will draw on data from a corpus
of spoken, conversational Bislama (the English-lexified creole spoken
in Vanuatu, SW Pacific) and a corpus of narratives recorded in Tamambo
(the Oceanic language spoken on Malo island, NW Vanuatu). Even small
corpora of spontaneous speech provide considerable data on variable
patterns that lie below the surface and that cannot be directly
I use some standard variationist tools to explore
the patterns underlying variable presence of pronominal subjects and
objects in Bislama. While most work on substrate transfer in creoles
focuses on features that are obligatory in the substrate language, an
interesting property of subject and object presence in Bislama is that
both the input (in this case, the Tamambo norms) and the output (the
Bislama norms) are variable. How do speakers deal with variable input?
Do they replicate the patterns of one language in the other? Do they
regularise the variable input? Or do they innovate and create wholly
The answers from the Tamambo and Bislama corpora
suggest a surprising mix of persistence and innovation. To the extent
that the answers converge with other research on variation in dialect
contact – and with our on-going work on language contact in the UK – I
will suggest that they point to some fundamental cross-linguistic
constraints on the replication of variation.
- Stefan Th. Gries (University of California, Santa Barbara), Fri 28 May, 9-10 am:
"Corpus Linguistics, linguistic theory, and (psycho)linguistic models: Some comments plus implications for the study of variation in corpora"
Recent discussions on the CORPORA list suggest that the field of corpus linguistics is more divided than a superficial glance at the assumptions shared by corpus linguists might suggest. In the first part of this talk, I will critically discuss some views on corpus linguistics and its relation to linguistic theory in general and one linguistic theory in particular. In particular, I will express some concerns about (aspects of) corpus-driven linguistics, the rule-governedness and psychological generalizability of corpus linguistic findings, as well as some corpus linguists' views of adjacent fields.
In the second part of the talk, I will discuss a particular psycholinguistic model of language acquisition, representation, and processing, which has not only proven extremely versatile and powerful in a wide variety of linguistic subdisciplines, but is also highly compatible with corpus-linguistic approaches, in fact even relies on corpus approaches. I will introduce and exemplify the main assumptions of this model as well as highlight its characteristics and benefits especially with regard to how it handles variation data such as morphological and syntactic alternations as well as changes over time (in developmental and diachronic studies).
- Michaela Mahlberg (University of Nottingham), Sat 29 May, 9-10 am:
"The corpus stylistic analysis of fiction – or the fiction of corpus stylistics?"
The interest in corpus approaches to the analysis of literature seems to be growing and the term ‘corpus stylistics’ is becoming more and more popular. In this paper I want to look at challenges and opportunities for the field of corpus stylistics. Corpus stylistics draws on the potential that comparative data provides for the analysis of individual texts. With a focus on discourse-level analysis, corpus stylistics needs to link quantitative findings with interpretations and literary criticism. With examples mainly drawn from works by Charles Dickens and other 19th century fiction, I will be looking at linguistic elements of textual worlds. The examples for textual building blocks of fictional worlds refer to the description of characters as well as the narrator’s involvement in the story. A corpus approach to textual worlds can be seen as complementing cognitive approaches such as text world theory (cf. Gavins 2007). The underlying corpus methodology deals with the retrieval of ‘clusters’, i.e. repeated sequences of words such as I am glad to hear, with a shriek and a or all that sort of thing. The main literary critical approach that will be discussed in view of corpus findings focuses on the ‘externalisation’ of characters in Dickens (cf. John 2001). The paper also argues that the questions we ask in corpus stylistics should not be dictated by the functionality of the computer tools that are available to us, but tools need to be developed that help us investigate the questions suggested by features of a text. With the help of an XML corpus of texts annotated with information relating to quoted speech, examples of fictional speech will be discussed. The paper aims to show the value of the combination of corpus-driven methods and literary stylistics.
Gavins, J. 2007. Text World Theory. An Introduction. Edinburgh: EUP.
John, J. 2001. Dickens’s Villains. Melodrama, Character, Popular Culture. Oxford: OUP.
- Elizabeth C. Traugott (Stanford University), Sun 30 May, 9-10 am:
"The persistence of linguistic contexts over time: Implications for corpus research"
Most instances of grammaticalization have been shown to arise in restrictive contexts (cf. Bybee, Perkins, and Pagliuca 1994). The persistence over time of linguistic contexts ("co-texts" broadly defined to include prior discourse) raises theoretical and methodological issues for historical corpus research. What is the appropriate unit of linguistic context? How long do contexts remain relevant in the history of specific constructions? In quantitative work should "bridging contexts" (Heine 2002) and "critical contexts" (Diewald 2002) for grammaticalization be counted after grammaticalization has set in? I argue that bridging contexts should be counted (contra Eckardt 2006) because they persist over time but critical contexts cannot be as they do not persist. The persistence of linguistic contexts for morphosyntactic developments suggests that prior context should be considered an integral component of corpus research. Examples will be drawn from a variety of corpora, most especially the Proceedings of the Old Bailey 1674-1834 (see Huber 2007).
Bybee, Joan, Revere Perkins & William Pagliuca (1994): The Evolution of Grammar: Tense, Aspect, and Modality in the Languages of the World. Chicago: University of Chicago Press.
Diewald, Gabriele (2002): "A model for relevant types of contexts in grammaticalization", New Reflections on Grammaticalization, eds. Ilse Wischer & Gabriele Diewald. Amsterdam: Benjamins. 103-120.
Eckardt, Regine (2006): Meaning Change in Grammaticalization: An Enquiry into Semantic Reanalysis. Oxford: Oxford University Press.
Heine, Bernd (2002): "On the role of context in grammaticalization", New Reflections on Grammaticalization, eds. Ilse Wischer & Gabriele Diewald. Amsterdam: Benjamins. 83-101.
Huber, Magnus (2007): "The Old Bailey Proceedings, 1674-1834: Evaluating and annotating a corpus of eighteenth- and nineteenth-century spoken English", Annotating Variation and Change. eVARIENG: Studies in Variation, Contact and Change in English, Volume 1, eds. Anelli Meurman-Solin & Arja Nurmi, <http://www.helsinki.fi/varieng/journal/volumes/01>.
The Proceedings of the Old Bailey 1674-1913. <http://www.oldbaileyonline.org/>.