From Data Collection to Model Deployment
“Data are just summaries of thousands of stories—tell a few of those stories to help make the data meaningful.” —Chip and Dan Heath
Data science techniques are changing daily, thus requiring constant evolution of academic curriculum to maintain relevance. Paper-based books full of text-based instruction are inadequate because they cannot keep up with the rate of change. Therefore, the purpose of this online book is to teach—through practice-based video tutorials—the latest and most common techniques for both descriptive and predictive data analytics. We currently use an industry-leading tool, Tableau, to teach dashboard design and storytelling, which describes the current state of an organization based on measurable data. However, the supreme value of data is in its ability to predict the future. This is also the most difficult and risky directive. Therefore, we begin by teaching basic methods in Excel for multiple regression and the assumptions of linear regression. Afterward, the bulk of the course is spent covering more advanced algorithms and techniques using an industry-leading tool for predictive analysis: Microsoft Azure Machine Learning Studio. We chose these tools first because they are mainstream tools that you are likely to use across a variety of industries, and second because both come with free versions for students.