# Linear models and their application in R

## June 7-25, 9-15.30 (daily)

**CONTENT: **Linear models represent a flexible framework allowing the analysis of the effects of one or several (quantitative or qualitative) predictors on a single response (which can be, e.g., continuous, a count, or binary). As such they encompass, for instance, linear regression, the t-tests, ANOVA, ANCOVA, the Generalized Linear Model (e.g., logistic, Poisson, zero-inflated, or negative binomial models), and Mixed (a.k.a. multi-level) Models. Hence, linear models allow to address a huge variety of questions with various types of data, using a unified conceptual and statistical framework.

In the course I treat all the above, that is linear models from simple regression to the Generalized Linear Mixed Model (GLMM). I begin with simple linear regression and then explain how this concept can be extended to model the impact of multiple predictors, categorical predictors, interactions, and certain non- linear relationships (i.e., the 'general linear model'). Then I proceed with introducing how the general linear model can be expanded to the 'Generalized Linear Model' (e.g., logistic, Poisson, zero-inflated, or negative binomial regression). Finally I treat the (Generalized) Linear Mixed Model (i.e., models allowing the inclusion of grouping variables or 'random effects'). Further lessons will be devoted to a brief introduction to non-linear models, how to formulate scientifically meaningful models, and perhaps information theory based as well as multi model inference.

Throughout the course I put much emphasis on the conceptual meaning and interpretation of the models rather than on their 'mechanics' (i.e., the mathematical background). Practically this means that we shall devote quite some time to understanding what such models reveal about 'life' (i.e., the process investigated) and particularly to understanding and interpreting interactions. In fact, I consider it an important component of the course to try teaching how models and 'life' are linked, i.e., how one can put hypotheses and questions about life into models and what these then can (and cannot) reveal about it.

The course is mainly centred around a null-hypothesis significance testing framework, largely because this still is the by far most frequently used approach. However, if time allows I shall also explain the concept of information theory based inference (and we might also practically apply it). Furthermore, the models themselves, i.e., how they are set up with regard to, for instance, interactions, fixed and random effects, random slopes, error and link function, their meaning, interpretation (and limitations), are unaffected by the philosophy used to draw statistical inference.

**STRUCTURE:** The course consists of roughly (regularly interspersed) 50% theory and 50% practical applications during which we shall work ourselves through various models. As part of that, participants will also learn how to plot the results of the models treated and how to describe them in the methods and results sections of a paper. Finally, I put much emphasis on assumptions and model diagnostics and how to evaluate them.

**TARGETED AUDIENCE:** The targeted audience is not limited to students, but also faculty members, post docs, etc. are welcome to attend.

**REQUIREMENTS:** The course requires some familiarity with general ideas/concepts of statistics and also the basic concepts of R. Regarding the former, you should have some experience with applied statistics, and be somewhat familiar with things like null-hypothesis significance-testing, 'error level', etc. Regarding the latter, you should have some experience with R, for instance, knowing how to read a file into it and run some simple tests (e.g., t-test, ANOVA, or non- parametric tests) and create simple plots. Regarding R, a couple of weeks before the course begins I'll make available two tutorials giving a general introduction to R and an introduction to plotting in R, and participants are expected to have a serious look at these (total of ca. 100 pages) before the course begins.

The individual lessons of the course build heavily upon one another. Hence, it is a requirement that every participant attends throughout and all of them (missing even just a few hours may make it very hard to catch up later). Also, it probably pays a lot to invest extra time to go through the treated material again and the exercises I may provide. Hence, I strongly advise to keep these three weeks as free of other obligations as possible (particularly participants who are not somewhat familiar with linear models should consider this). Given the potentially limited space and that participants need to prepare in advance, it is mandatory to sign up for this course (see above). **When signing up, please be aware that we consider this a binding statement that you plan to participate**.

**MATERIAL PROVIDED:** The course is accompanied with plenty of handouts which will be made available during it.

**WHEN & WHERE**: The course will be given online and takes place during the three weeks from **Monday June 7, 2021, to Friday June 25, 2021**, with lessons taking place each workday (Monday to Friday) from **09:00 (sharp) to ca. 15:30 CEST** with a couple of short breaks in between and also a longer lunch break of ca. 45 minutes.

**CREDITS: **Certificates of participation are accepted to fulfil credit requirements of the PhD programme Behavior and Cognition as well as the GGNB programmes (for GGNB, 4.5C are given). A regular attendance is required. Students of other PhD programmes should inquire with the heads of their study programmes regarding credits.

**LANGUAGE: **the course language will be English.

**SIGNING UP:** by sending an email to **statistics_teaching(at)dpz.eu**; **when registering, include your full name, affiliation, status (faculty member/PostDoc/PhD, Master) and study programme (if applicable)**. You can register between **April 12 and** **May 2, 2021**. People signing up later cannot be considered. Due to the priority regulations (see below), signing up does not imply that you can participate. Everyone who signed up will receive an email soon after May 2, stating whether s/he can participate.

**BEFORE YOU SIGN UP: please be aware that the course requires a more or less full time commitment for three entire weeks. Please sign up only if you will be definitely able to participate (given no major unexpected events). 'No shows' and late sign-outs will have reduced priority of access in future rounds of the course; GGNB and GAUSS doctoral students will be banned from all skills and methods courses and funding options for 12 months.**

**PRIORITY OF ACCESS:** 80 slots are available, and in case more people sign up for the course than can participate, priority of access is given to first, members of the Leibniz ScienceCampus Primate Cognition, second, staff, members, and students of the University of Goettingen and the German Primate Center, and finally (third), to everyone else. Within each group priority of access is given in the order in which people have signed up. Finally, in future rounds of the course people that have signed up for this round but then signed out again or just didn't appear might get lower priority.

# Contact

Dr. Roger Mundry Biostatistician +49-551-3851-478 Contact

# Registration

Registration period | April 12-May 2, 2021 |

registration by | email to statistics_teaching@dpz.eu |

required information | full name, affiliation, status (faculty member/PostDoc/PhD/Master), study programme (if applicable) |

**Note: The content and topics of the workshop are the basis for a workshop on Data Simulation planned for autumn 2021**

**Note: The content and topics of the workshop are the basis for a workshop on Data Simulation planned for autumn 2021**