1.9 Introduction to R
Installing the Software
In this course, we will be using R, a free statistical platform that is now commonly used by businesses and universities worldwide. Based out of the University of California, Berkeley, R is a powerful, open-source statistical computation language.
To begin, first go to the Comprehensive R Network website. Download and install the appropriate version of R for your computer system. R works on Windows, Mac, and Linux operating systems. Then download and install RStudio from the RStudio website.
RStudio is a free integrated development environment (IDE) that makes the raw R program more user-friendly. Once you have installed both R and RStudio, you are ready to begin. If you are using a Mac computer, you may need to install XQuartz on the XQuartz Project website as noted in the R materials, and you may also need to install Xcode from the app store.
Inputting Commands
R has a command line interface. This means that if you know what you want done, R is there to do it for you; you just have to command it. The advantage of this kind of interface is that you have direct access to the command functions. R was made by statisticians for ease of analysis, which means that you can often run a wide variety of analyses by changing just part of the code. On the other hand, it means that you will need to know what the possible commands are, as there are no buttons or other visible guides built in for you to try in order to see what they do.
Teaching you the commands that are most common and most useful is the purpose of this textbook, so let's start with something simple. Built into R are a number of datasets. To access the one we want to see, open RStudio and type in (or copy and paste) the following command to the console: View(attitude). Then follow the instructions below to try out different commands.
-
Push Enter, and you should see a dataset in a spreadsheet that has "rating" as the first variable and "advance" as the last one.
-
The "View" command has a logic similar to that of many R commands. That is, the command is given (you may notice that as you begin to type "View" that RStudio begins to give you command options to choose from), and the variable you want the command to apply to is included in parentheses. To understand what the data refer to, type in ?attitude and you should see a file called "The Chatterjee–Price Attitude Data," which comes from a survey of business employees.
-
Take a moment to look over the information.
-
The help file gives a description of the data, usage, format, and source, and even provides sample code that can be used. Try typing ?View to see the R help file for the "View" command.
-
Since we know that there are many datasets included in R, type attach(attitude) so that R knows that the commands you type in are meant to apply to that dataset only.
-
R remembers what you type in. Try pushing your "Up" arrow key to see the code you just entered. You can push the "Down" arrow to go to more recent commands. This comes in very handy because so many analyses may only require a small change (for example, if you are making a number of graphs, you may just want to change the name of the variable you are interested in graphing), and this command essentially copies the command for you.
R Packages
R is built around an idea similar to that of a wheel and spokes. There is the basic R package that you have already (we hope!) installed and examined. This forms the basic code that runs R and allows for a few simple analyses and graphs. The power in R, however, is the packages, of which there are almost 20,000 (as of early 2021). Packages are extensions that can do everything from producing publication-quality graphics for your data (e.g., ggplot2) to running the latest analytics for social media (e.g., SocialMediaMineR) along with many more possibilities. For the latest number of packages, see the Contributed Packages web page. Each of these packages extends the ability of R and allows you to tailor R to your needs and to the tasks you most commonly use.
The power of R is that, with so many packages, and with so many regularly updated, you have perhaps the greatest database for analytics ever known. Best of all, it is free to use. As a result, no matter what your income, business size, or profit margins, you can use R—all you need is a modern laptop or desktop computer. For this class, we have carefully chosen and curated the packages that we feel best produce the results you will need and also give you the easiest learning curve for the coding involved. In some sense, this is the complexity of R; with so many packages it can be difficult to determine which is the best to use of the many available. Also, sometimes a command in one package means and even does something completely different in another package. In practice, some of the packages may be extraordinary, but finding the right one can be a challenge. Even though there are downsides, R is compelling to many because it is free and open source. You can make R a platform that does what you need it to instead of a statistical software package made for a generic user. R code is also easily integrated into a C++, Java, or Python environment, so your coders will be able to use it more extensively than they could with a simple table of results or other output. Because R is also a coding language, it gives you the ability to find packages created for a variety of analyses, and, for the brave or for those who can hire a programmer, you can even have your own package created specifically to your specifications and to fill your own company's needs. Alternately, if you have the expertise and the desire, you can make your own R packages and submit them to the R database. Then others who have similar interests can use your package. Though we aren't going to teach you how to do that in this book, as it is far beyond the scope of an introductory text, the possibilities are exciting!
The easiest way to find a package in R is to use a search engine. The website R Seek, developed by Sasha Goodman of Stanford University, is popular because it aggregates results across a number of R-centric sites. Alternately, you could simply Google the name of the analysis you want and follow it with "R" to see what the current options are as well as a list of sites that examine R. There are lots of options!
The R Community
The R community consists of those who make the R base package itself (the Comprehensive R Network), those who create the extensible packages (the Contributed Packages web page), bloggers (R-bloggers), forums (R mailing list archive and forum and R forums on StackExchange), and others. The strength of the R community is one of the most attractive parts of using R. It is the community that creates, updates, improves, teaches, and integrates R. For example, looking at the statistical packages alone, a typical statistics software program has a company behind it that decides what statistics it wants to make available and how it wants those shown, then assigns programmers and has releases based on profits, customer expectations and demands, and other factors. The most popular or cost-effective analyses are included, but many less popular ones may never be included. While you could petition your favorite program to include new features, they would look at each statistical feature as a business decision. Will creating it take away from their best-selling features? Do they have the human resources to do so? How will they market it?
As an open-source program, R has a very different model. Except for a very few analyses included in the base R, it is the community that creates the packages, finds the datasets, writes the directions and help files, and, essentially, makes R attractive as a program. Imagine trying to hire someone out of an Ivy League university to program analytics for you. In R, many have already done so—and free of charge. Imagine hearing of a fairly obscure analysis you would like to run. Again, it may already be there, or you could join the community and create it yourself. If you need tutoring on how to code, what statistics are best for the task you have in mind, or a myriad of other possibilities, the R community is there. In some respects, it is like a university with "campuses" anywhere someone accesses R, be it in the programs or on the internet itself. In the area of business you study, it is likely that other individuals, groups, and businesses may be trying to solve the same kinds of problems using the same statistical analyses. By joining forums and groups where ideas are discussed, you can stay abreast of the latest trends and analyses.
A question that might arise is, if R is free and open source, why do so many contribute to it? The answer is complex and might be related to self-interest, as creating R packages is a very marketable skill. Showing you can not only use but also create R packages might help with your salary negotiations (just knowing R can get you a job). On the other hand, for many, there is a sense of helping and serving that accompanies contributing. Knowing that many are looking for your analysis and you have made it easy for them to use can be a source of great satisfaction. Likewise, a professional or a professor might want their ideas used by others, and they use R as the best tool to share and disseminate their knowledge and skill. Among the many reasons the R community exists is the reality that it works so well and has accomplished so much.
Want to try our built-in assessments?
Use the Request Full Access button to gain access to this assessment.