About Software Tools

There are many software tools designed to facilitate data mining and analytics. However, these are often expensive and complicated to install, configure, administer, and use. When the first edition of this text was published back in 2012, there were very few software options that offered a combination of affordability, capability, and ease of use that could facilitate learning the basics of data mining and analytics. One product that did exist, however, was RapidMiner, a visual design studio that produces powerful data models within a drag-and-drop user interface. This software was selected as the platform for the first edition of Data Mining for the Masses. RapidMiner is now a product of the Altair company, which continues to develop the software’s capabilities while still offering free educational use. In the years since the first edition of this book, two other free and powerful analytics software platforms have also emerged: R and Python. These software products are now also included in this book. The text also occasionally refers to the use of spreadsheet software, such as Microsoft Excel. However, the book’s use of spreadsheets is primarily limited to data examination and basic preparation tasks. The inclusion of all of these tools is intended to help you see and experience a broad set of software options that can help you analyze data. It is important to keep in mind that this book does not attempt to teach all of the capabilities of any of these software packages. They’re just good, widely available, and freely accessible tools that you can use to get started in data mining and analytics. Individual instructors may choose to include or exclude any of these software products from their own courses, and the book has been designed to allow for customization to include or exclude any of the software product sections. If you use the book with all three software sections included in every chapter, you will notice some redundancy. This is intentional, to ensure that regardless of how you mix and match software tools with the book's content, the examples and instructions are consistent and cohesive.

All examples used in this book will be illustrated in a Microsoft Windows environment. With some slight variation in user interface, you can complete all examples and exercises by running any of the software products on Macintosh or Linux systems as well. In the case of R and Python, we will use in-browser versions of software tools to write and execute code. For R, this text will use an in-browser version of RStudio, provided by the company Posit. For Python, the book will use Jupyter Notebook provided by Google Colab. If you are using RapidMiner, the software will need to be downloaded and installed on your local computer. It is recommended that you set up and configure the relevant software packages on your computer now so that you can work along with the examples in the book if you would like.

  • RapidMiner can be downloaded from the RapidMiner website.

  • RStudio can be set up on Posit Cloud. Click “Get Started” to create a free account, which will be sufficient for all examples in this book.

  • Jupyter Notebook can be set up on Google Colab. You can sign in with an existing Gmail account if you have one, or you can set up a new Gmail account for free using Google’s website.

See the videos below for a short tutorial on how to download and set up each software environment.

As with all software, versions change over time. Such changes may impact the consistency of this book’s content with your experience. Thus, some screenshots of the software throughout this text may vary slightly if the software providers update their user interfaces or release new software versions. Sometimes readers have reported that their analytics results are slightly different from the screenshots in the book. Most of the time, this is simply because the algorithms implemented in the various software products have been tuned or improved since the book was published. Generally, what you see in the book will match what you will see on your computer if you complete all of the steps consistent with the text.