The new data science title “Hands-On Exploratory Data Analysis with Python,” by Suresh Kumar Mukhiya and Usman Ahmed from Packt Publshing is a welcome addition to the growing list of books directed to help newbie data scientists improve their skills. I’m always on the lookout for texts that can help my students find their way along the challenging path toward becoming a data scientist. I think this book fills a void for Exploratory Data Analysis (EDA) learning resources. But as I’ll discuss, the book goes beyond just EDA, and is maybe mistitled – it’s really an introduction to data science and machine learning using the Python language.
The book includes important EDA topics like Descriptive Statistics (Chapter 5), Grouping Datasets (Chapter 6), Correlation (Chapter 7), Time Series Analysis (Chapter 8), and Hypothesis Testing (first part of Chapter 9). These are all critical pieces of the data science process, and lucid discussions along with clear and simple code examples help the reader get moving. The publisher provides all the Python code from the book so the reader can hit the ground running.
My favorite part of the book is Chapter 4 on Data Transformation (aka data munging, or data wrangling). This is a very important area that often accounts for a majority of a project’s time and cost budget, and the examples provided in this chapter cover the most commonly needed tasks for a typical data science project (e.g. missing data handling, discretization, random sampling, etc.). Interestingly, data transformation isn’t really part of EDA, but I welcome the discussion as it broadens the scope of the book.
Chapter 2 on data visualization is a nice adjunct to the EDA discussions, because these two areas typically go hand-in-hand. Chapter 3 offers up an interesting use-case for demonstrating data access, data transformation, EDA, and data viz. The example centers around reading in all the emails from your Google account and performing a useful data analysis on the data. Nice touch!
Finally, the book also enters the realm of supervised machine learning, starting with the last part of Chapter 9 on regression models. Then Chapter 10 is a short introduction to various machine learning techniques. This chapter, however, is too brief to be a standalone learning resource, but it does kick-start the reader into thinking about this important topic.
The presumed goal of the last chapter, Chapter 11, is to offer a comprehensive data science example using the well-known Wine Quality data set from the UCI Machine Learning Repository. I’ve used this data set in my own class materials many times, and it’s well-suite for this purpose. My only caveat about this chapter is that it’s too simplistic and too short. But it does give a correct feel for the steps in the data science process, culminating in the use of a number of common ML algorithms and their interpretation.
I would say Hands-On Exploratory Data Analysis with Python is a good addition to the library of a newbie data scientist as it contains many of the most common techniques for putting together a solid machine learning solution. I will be adding this title to my data science bibliography given out to my Introduction to Data Science students.
Contributed by Daniel D. Gutierrez, Editor-in-Chief and Resident Data Scientist for insideAI News. In addition to being a tech journalist, Daniel also is a consultant in data scientist, author, educator and sits on a number of advisory boards for various start-up companies.
Sign up for the free insideAI News newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1
Speak Your Mind