Document Actions

FAQ

Please read the FAQ carefully before seeking advice in the office hour for corpus-linguistic projects - many of your questions will be answered on this page.

"How can I access the corpus material/introductory material provided on the homepage?"

Accessing some of the material on this homepage requires a password that is made available only to students of English linguistics at JLU. If you are among this group, you can request it via e-mail (contact ). Please make sure to use your "…@uni-giessen.de" e-mail address so your status as a student of English at JLU can be verified.

"What is a corpus? What is a concordancer, etc.?"

We will not answer basic questions about corpus linguistics via e-mail or in the office hour. It is your task to inform yourself about basic corpus-linguistic concepts. In order to do this, you should take a look at the introductory material and references provided on this homepage, which will answer some of these basic questions, and look for corpus-linguistic books in the library.

"Can you send me the ... corpus?"

No, due to copyright issues we cannot send you linguistic corpora. If you want them, you have to pick them up during office hours in B 408 (details and times here).

"I want to use WordSmith Tools - Do I have to buy a licence?"

If you decide that you want to use WordSmith Tools for your thesis/project/Examen, please come see me during my office hour. I can help you get a licence.

"I have no time to come to the korpuslinguistische Sprechstunde during the office hours, what can I do?"

In that case just send an e-mail so that an individual appointment can be arranged.

 "I want to write my thesis about a very specific feature of language and I haven’t found a corpus on the English linguistics homepage which reflects it. What can I do?"

First of all, the Internet is your friend. Before consulting us, you should see if such a specific corpus already exists and if it does, check whether it is freely accessible. (Our collection of links can help you get started!) If it does exist but it is not accessible, you can ask us and we can check if we have a license for it. If what you need does not exist, you might consider compiling your own corpus. In that case you can consult us and we will provide advice on what to bear in mind in the compilation process.

"Some frequently used corpora (FROWN, FLOB, BNC) are from the 1990s; do they still reflect current language use?"

True, many English reference corpora were compiled in the pre-Internet era. Bear in mind, though, that language change is a slow process, so that corpora from the early 1990s still reflect current language use to a large extent. As long as you state in your paper that they are not fully up-to-date (which again can have an influence on the results of your studies), there should be no bigger problems.

 "I want to work with the BNCweb. How can I access it?"

In principle, anyone can access the BNC. Just go the BNC homepage and register with your "...uni-giessen.de" email address. The homepage offers a very detailed tutorial on how the BNC can be used. For more detailed information have a look at Hoffmann et al. (2008): Corpus linguistics with BNCweb – a practical guide. Frankfurt: Peter Lang (the book is also on our list of references)

"I have decided to create my own corpus. How big does it need to be?"

That really depends on what you want to analyze. As a rule of thumb, the more specific your analysis is, the bigger your corpus needs to be. In addition, it also depends on the sources for your corpus. Depending on your topic, it may be very hard to gain access to texts. In general, the standard size of many linguistic corpora is 1 million words. However, sometimes it is just not possible to create huge corpora due to limited sources or copyright restrictions. In other cases a much smaller corpus suffices to prove your point. In any case, as long as you always explain the size of your corpus and the compilation process, you should be on the safe side.

You might also want to consider first compiling a pilot corpus, so that you can check beforehand how many instances you can expect to find in your corpus. Then you can decide on the size of your final corpus.

"Can I discuss theoretical/conceptual aspects of my thesis with you? Is my research question acceptable? Can I write my thesis about XYZ etc.?"

Unfortunately, I cannot help you with that. The CL Help is intended to help you with methodological, specifically corpus linguistic, issues. Topics like your choice of theoretical frameworks, the acceptability of your research question/hypothesis etc. have to be discussed with your advisor. I strictly cannot give you advice on these issues.

"Do I always have to use raw AND normalized frequencies in my project?"

Usually it is enough to mention your raw frequencies once in a data table. In most cases, there is no need to visualize them in a chart. After mentioning the raw frequencies once, you should instead focus on giving normalized frequencies on every occasion because, unlike the raw frequencies, they are meant to be comparable. And, therefore, they should form the basis of your analysis.

You also don't need to compare or discuss differences between your raw and normalized frequencies. As a general rule of thumb, always try to avoid unnecessary redundancies. 

"How do I cite corpora and corpus software in my list of references?"

There are different ways to cite corpora in your list of references depending on the style sheet you are using. Below, you can find some examples. Please note you should use the full name of the corpus and not only the abbreviation such as BNC or GloWbe.

In your list of references, the corpus should appear under the name of the editor. Only in case there is no editor available, the corpus can be listed under the name of the corpus.

 - Davies, Mark (2013): Corpus of Global Web-based English: 1.9 billion words from speakers in 20 countries. Available online at: <URL>.

 - The British National Corpus, version 2 (BNC World), 2001. Distributed by Oxford University Computing Services on behalf of the BNC Consortium. Available online at: <URL>.

Corpus software can be indicated as in the following example:

 - Anthony, L. (YEAR OF RELEASE). AntConc (Version VERSION NUMBER) [Computer software]. Tokyo, Japan: Waseda University. Available from <URL>.