21.4 EDA and Basic Visualization
EDA and Basic Visualization
Exploratory Data Analysis (EDA)
Exploratory Data Analysis, usually referred to as EDA, describes the process of familiarizing yourself with the data preparatory to modeling. It involves gaining an understanding of the variables included in the data set, identifying possible outliers, creating basic visualizations, and finding patterns and relationships in the data. EDA can help you determine what modeling strategies and methods might be best to perform for a specific data set.
Data Visualization
Because of the way the human brain processes information, using charts or graphs to visualize large amounts of complex data is easier than pouring over spreadsheets or reports. Well designed data graphics are usually the simplest and at the same time, the most powerful way to understand your dataset and gain insights from it.
Python has many libraries for practically every data visualization need. Some accomplish very specific analysis, and some can be used no matter what your field is. The following are some of the most popular libraries:
Matplotlib - One of the oldest and by far the most popular of the InfoVis libraries, released in 2003, with a very extensive range of 2D plot types and output formats
Seaborn - This library harnesses the power of matplotlib to create beautiful charts in a few lines of code. The key difference is Seaborn’s default styles and color palettes, which are designed to be more aesthetically pleasing and modern
Pygal - Offers interactive plots that can be embedded in the web browser. Its prime differentiator is the ability to output charts as SVGs
Plotly - It's strength is making interactive plots, but it offers some charts you won’t find in most libraries, like contour plots, dendrograms, and 3D charts
The following script is a basic example of how we can use python to explore and visualize our dataset: