Workshop 4: Introduction to statistics for linguistics with R
Partcipants are required to bring their own laptops.
Statistical analysis of data is crucial to many branches of linguistic research, especially to corpus-linguistic and experimental approaches. In the last ten years, the statistical programming language R has crystallized as the preferred tool for statistical analysis of language data, as it is both flexible and powerful. Despite many excellent text books (Baayen 2008, Johnson 2008, Gries 2013, Levshina 2015), doing statistical analysis in general, and programming it in R in particular, remains rather opaque to many scholars.
This short course will contain an introduction to central issues of statistical data analysis, the basics of using R, and how to implement basic hypothesis tests. We will cover:
- the most common types of data for linguistic analysis: proportions, frequencies/counts, numerical values
- the logic behind statistical inference
- R fundamentals: how to load, work with, and export data
- descriptive summary statistics
- frequency association measures
- correlation and regression analyses
- common pitfalls of using statistical analyses
- basic visualization techniques
After this class, participants will be able to do perform simple analyses in R, and will be prepared for the more advanced material found in the textbooks mentioned above.
Baayen, R. Harald. 2008. Analyzing linguistic data: a practical introduction to statistics using R. Cambridge, UK & New York: Cambridge University Press.
Gries, Stefan Th. 2013. Statistics for linguistics with R. 2nd edn. Berlin & New York: De Gruyter Mouton
Johnson, Keith. 2008. Quantitative Methods in Linguistics. Malden, MA: Blackwell.
Levshina, Natalia. 2015. How to Do Linguistics with R: Data exploration and statistical analysis. Amsterdam: John Benjamins.