Resources on the web
- surveys of corpora and text archives: Looking for a corpus containing data on a specific variety or from a particular period? Interested in what corpora or text collections are available "out there"? Have a look at these lists/link collections!
- Martin Weisser's Bookmarks for Corpus-based Linguists: ca. 1,000 categorized and annotated links to resources (corpora, text archives,...) for corpus-based linguists.
- Corpus Resource Database (CoRD): an open-access online resource where academic corpus compilers can publish descriptions of their corpora (offered and maintained by the Research Unit for Variation, Contacts and Change in English, University of Helsinki); currently lists 55 English-language corpora, subcorpora and databases; includes links and a Corpus Finder tool
- the companion website to Corpus-based Language Studies: An Advanced Resource Book (2005; London: Routledge) by Tony McEnery, Richard Xiao and Yukio Tono: includes a Corpus Survey and links to further resources
- List of texts, text centres and web resources (maintained by the International Computer Archive of Modern and Medieval English)
- : a list of more than 100 learner corpora, maintained by the Centre for English Corpus Linguistics, Université de Louvain
Please note: You will not be able to access all corpora linked to in these databases. The extent to which corpus data can be accessed varies greatly: while you may be allowed to download the corpus in its entirety in some cases, other websites will offer access to the data via a web interface only (you also may need to register as a user). Many corpora are not available unless a license fee is paid (in such cases, please check whether the Linguistics section owns a license).
- information on corpus analysis software
- Laurence Anthony's AntConc: AntConc for download, video tutorials that teach you how to use the software, links to online help (including a discussion forum for questions), documentation/manual, books/papers related to AntConc
- WordSmith Tools (Oxford University Press & Lexical Analysis Software; Mike Scott): the Support section includes 'get-started guides' in a number of languages, answers to FAQs, online help and a link to the online WordSmith discussion group
- help with quantitative data/statistics
- Log-likelihood calculator (University of Lancaster)
- Information on using statistics in "Einführung in die Korpuslinguistik: Praktische Grundlagen und Werkzeuge" [German]
- Sample Size Calculator - Helps you to determine how big your sample needs to be in order to precisely represent the corresponding population.
Some free, web-accessible corpora
- AusNC - The Australian National corpus (also includes ICE-Australia) - Requires an account. Limited access to some of the corpora (e.g. Monash, ICE-AUS)
- BNC - The British National Corpus. Requires an Account.
- CMSW - Corpus of Modern Scottish Writing, and SCOTS - Scottish Corpus of Texts and Speech. Free Download and full access. No registration required.
- EEBO - Early English Books Online (British and American Books pusblished between 1475 and 1700). Full Access granted from JLU intranet.
- FALKO - Fehlerannotiertes Lernerkorpus (can be searched using the ANNiS³ web interface). Free full access without registration.
- - The Old Bailey Corpus (spoken English in the 18th and 19th centuries). Full access with JLU Login.
- SBC - The Santa Barbara Corpus of Spoken American English. Free Download, full access, no registration.
- VOICE - The Vienna-Oxford International Corpus of English (1 million words of spoken English used as a lingua franca). Free download. Web-interface requires registration.
- various corpora at corpus.byu.edu (site maintained by Mark Davies), e.g. the Corpus of Contemporary American English (COCA), the Corpus of Historical American English (COHA), the Corpus of Global Web-Based English (GloWbE), the Corpus of American Soap Operas etc. Limited number of daily queries & other limitations for free users. JLU does not have a commerical licence for BYU web interfaces.