What to expect in this book

The remainder of the book is organized as follows:

  • Introduce and define key terms and concepts

    • 1. Introduction to Data Mining

  • Database essentials. For audiences with a background in information systems, information technology, or relational databases, these three chapters can be skipped. Check with your instructor to be sure.

    • 2. Relational Databases

    • 3. Data Storage (ERD)

    • 4. Data Retrieval (SQL)

  • Introduce the framework and methodology for data mining projects—the primary topic of the course

    • 5. CRISP-DM

  • CRISP-DM Phase 1: Business Understanding

    • 5. CRISP-DM (this phase is covered along with the introduction to CRISP-DM in Chapter 5 because it is not the primary focus of this course)

  • CRISP-DM Phase 2: Data Understanding

    • 6. Data Understanding: Visualization

    • 7. Data Understanding: Statistics

  • CRISP-DM Phase 3: Data Preparation

    • 8. Data Preparation: Cleaning and Feature Engineering

  • CRISP-DM Phase 4: Modeling; CRISP-DM Phase 5: Evaluation (these phases are performed iteratively and in unison)

    • 9. Modeling and Evaluation: Introduction to MLR

    • 10. Modeling and Evaluation: Feature Selection

    • 11. Modeling and Evaluation: Algorithm Selection

    • 12. Modeling and Evaluation: Optimizing Model Fit

  • CRISP-DM Phase 6: Deployment

    • 13. Deployment: Prediction Calculators

  • Up to this point, the book has covered the entire CRISP-DM methodology, but only in the context of prediction calculators and traditional features (categorical and numeric). However, prediction calculators are just one type of model (and the most simple). There are many more advanced modeling topics (e.g. anomaly detection and recommendation) and feature engineering topics (e.g. text hashing) that are very useful to learn. The following chapters extend the concepts and skills learned in the previous chapters into more advanced and useful business contexts. Check with your instructor to see which of these topics you will cover in your particular course:

    • 14. Anomaly Detection

    • 15. Text Analytics

    • 16. Recommendation Engines

    • 17. Deployment of Recommendation Engines

  • Final Project to integrate all of the skills learned in this course: a simple deployment of a machine learning model into an Excel-based dashboard

    • 18. Final Project