1  What is R ?

R is a free, user developed, object-oriented statistical programming language. It originates in the ‘S’ and ‘S Plus’ languages developed in the 1970s and 1980s. It has been widely used in science, statistics and data sciences and is increasingly being adopted in the social sciences for teaching and research purposes.

R can be downloaded from the Comprehensive R Archive Network (CRAN) website. Installation instructions as well as guides, tutorials and FAQ are available on the CRAN website. A considerable amount of R-related resources is available online.

Anyone can install and use R without charge, and to some extent contribute and amend the existing program itself. R is particularly favoured by users who want to develop their own statistical functions or implement technical advances that are not yet available in commercial packages. The existence of a vast number of user written packages (21132 at the time of writing this guide) is one of the strengths of R. Users interested in publishing their own packages on CRAN should be aware that a minimum set of rules need to be followed by contributors.

Although R can perform most of the analyses available in traditional statistical software such as Stata, SPSS, or SAS, it has broader applications used for mapping, data mining or machine learning. Its flexibility as a language allows users to carry out analyses in multiple ways, each with distinct advantages and disadvantages. Also, users can easily produce publication quality output in R using Markdown (now Quarto) and LaTeX document presentation systems. In addition, R graphs can easily be imported into html, MS Word or LibreOffice documents.

By contrast with other statistical software, the R interface is minimal and consists of a terminal. Similar to languages such as Python or C, R users can access it via a programming interface or Integrated Development Environment (IDE). In this guide we will use R Studio, one of the most common IDEs for R.
The data used in this introduction is the British Social Attitudes Survey, 2017, Environment and Politics: Open Access Teaching Dataset, which can be downloaded from the UK Data Service website without registration. The website also has instructions on how to acquire and download large-scale survey datasets. Links and further information about the other training resources available online are provided at the end of this document.

Although R has advantages over other statistical analysis software, it also has a few downsides, both of which are summarised below. Users should be reminded that as open-source software, R and its packages are mainly developed by volunteers, which makes it a very flexible and dynamic project, but at the same time reliant on developers’ free time and goodwill.

Advantages and downsides of R
Pros Cons
R is free and allows users to perform almost any analysis they want. The learning curve may be steep for users who do not have a prior background in statistics or programming.
R puts statistical analysis closer to the reach of lay people rather than specialists.
Transparency of use and programming of the software and its routines, which improves the peer-reviewing and quality control of the software.
Very flexible. Problem solving (for both advanced users and beginners) may be time-consuming, depending on how common the problem encountered, and may lead to more time spent solving technical rather than substantive issues.
Availability of a wide range of advanced techniques not provided in other statistical software Many people who design R packages are, or will become busy academics. Packages can stop being maintained without notice.
A very large user base provides abundant documentation, tutorials, and web pages.

There are several (sometimes many) ways of achieving a particular result in R. This can be confusing for novice researchers, but at the same time will allow users to tightly adjust their programmes to their needs.