1.2 About Software Tools
There are many software tools designed to facilitate data mining and analytics. However, these are often expensive and complicated to install, configure, administer, and use. When the first edition of this text was published in 2012, there were very few software options that offered a combination of affordability, capability, and ease of use that could facilitate learning the basics of data mining and analytics. One product that did exist, however, was RapidMiner, a no-code visual design studio that produced powerful data models within a drag-and-drop user interface. RapidMiner was developed in the early 2000s and grew to become a popular and widely used software tool for data science, data mining, analytics, and machine learning. The first several editions of Data Mining for the Masses used RapidMiner to teach analytics techniques.
As with most successful technology companies, change is rapid and frequent. The company that developed RapidMiner was acquired by Altair Corporation in late 2023, and in the spring of 2024, Altair rebranded the software as Altair AI Studio. Altair committed more than $100 million to the development of the AI Studio platform, with added emphasis on visual design of machine learning and analytics processes for artificial intelligence.
During the summer of 2025, Altair Corporation was acquired by Siemens, a German multinational technology company that traces its roots all the way back to 1847. As a result of the Siemens acquisition, the future of the RapidMiner/AI Studio software product is somewhat unclear. With thousands of organizations all over the world using either RapidMiner or AI Studio, it is expected that the software will be maintained in its current form for the foreseeable future. Data Mining for the Masses is being updated regularly; however, you many notice some references to both the old and new product names. Please be aware that any references to “RapidMiner” are synonymous with AI Studio. These software tools are one and the same. To simplify your access and use, installation files for the final release of RapidMiner (version 10.3) and the most recent release of AI Studio (version 2025.1.1) have been included with this learning resource. Your instructor can make these files available as needed. They are available for both Windows and MacOS. AI Studio has separate installers for Intel-based Macs and Apple Silicon (ARM)-based Macs. For all of these installer files, you can acquire an educational license by filling out the form located here: Altair RapidMiner Registration.
When signing up for a license, be sure to select the “Educational Purposes” option on the form.
In the years since the first edition of this book, two other free and powerful analytics software platforms have also emerged: R and Python. These software products are now also included in this book. The text also occasionally refers to the use of spreadsheet software, such as Microsoft Excel. However, the book’s use of spreadsheets is primarily limited to data examination and basic preparation tasks. The inclusion of all of these tools is intended to help you see and experience a broad set of software options that can help you analyze data. It is important to keep in mind that this book does not attempt to teach all of the capabilities of any of these software packages. They’re just good, widely available, and freely accessible tools that you can use to get started in data mining and analytics. Individual instructors may choose to include or exclude any of these software products from their own courses, and the book has been designed to allow for customization to include or exclude any of the software product sections. If you use the book with all three software sections included in every chapter, you will notice some redundancy. This is intentional to ensure that regardless of how you mix and match software tools with the book’s content, the examples and instructions are consistent and cohesive.
The examples provided in this book are primarily illustrated in a Microsoft Windows environment. With some slight variation in user interface, you can complete all examples and exercises by running any of the software products on Macintosh or Linux systems as well. In the case of R and Python, we use in-browser versions of software tools to write and execute code. For R, this text uses an in-browser version of RStudio, provided by the company Posit. For Python, the book uses Jupyter Notebook provided by Google Colab. When using RapidMiner/AI Studio, the software must be downloaded and installed on your local computer. It is recommended that you set up and configure the software on your computer now so that you can work along with the examples in the book if you would like.
-
RapidMiner or AI Studio can be downloaded via MyEducator, as mentioned above.
-
RStudio can be set up on Posit Cloud. Click “Get Started” to create a free account, which will be sufficient for all examples in this book.
-
Jupyter Notebook can be set up on Google Colab. You can sign in with an existing Gmail account if you have one, or you can set up a new Gmail account for free using Google’s website.
See the videos below for a short tutorial on how to download and set up each software environment.
As with all software, versions change over time. Such changes may impact the consistency of this book’s content with your experience. Thus, some screenshots of the software throughout this text may vary slightly if the software providers update their user interfaces or release new software versions. Sometimes readers have reported that their analytics results are slightly different from the screenshots in the book. Most of the time, this is simply because the algorithms implemented in the various software products have been tuned or improved since the book was last updated. Generally, what you see in the book will match what you will see on your computer if you complete all of the steps consistent with the text.