Becoming a Data Head
How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning

1. Auflage Juni 2021
272 Seiten, Softcover
Wiley & Sons Ltd
"Turn yourself into a Data Head. You'll become a more valuable employee and make your organization more successful."
Thomas H. Davenport, Research Fellow, Author of Competing on Analytics, Big Data @ Work, and The AI Advantage
You've heard the hype around data--now get the facts.
In Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning, award-winning data scientists Alex Gutman and Jordan Goldmeier pull back the curtain on data science and give you the language and tools necessary to talk and think critically about it.
You'll learn how to:
* Think statistically and understand the role variation plays in your life and decision making
* Speak intelligently and ask the right questions about the statistics and results you encounter in the workplace
* Understand what's really going on with machine learning, text analytics, deep learning, and artificial intelligence
* Avoid common pitfalls when working with and interpreting data
Becoming a Data Head is a complete guide for data science in the workplace: covering everything from the personalities you'll work with to the math behind the algorithms. The authors have spent years in data trenches and sought to create a fun, approachable, and eminently readable book. Anyone can become a Data Head--an active participant in data science, statistics, and machine learning. Whether you're a business professional, engineer, executive, or aspiring data scientist, this book is for you.
Foreword xxiii
Introduction xxvii
Part One Thinking Like a Data Head
Chapter 1 What Is the Problem? 3
Questions a Data Head Should Ask 4
Why Is This Problem Important? 4
Who Does This Problem Affect? 6
What If We Don't Have the Right Data? 6
When Is the Project Over? 7
What If We Don't Like the Results? 7
Understanding Why Data Projects Fail 8
Customer Perception 8
Discussion 10
Working on Problems That Matter 11
Chapter Summary 11
Chapter 2 What Is Data? 13
Data vs. Information 13
An Example Dataset 14
Data Types 15
How Data Is Collected and Structured 16
Observational vs. Experimental Data 16
Structured vs. Unstructured Data 17
Basic Summary Statistics 18
Chapter Summary 19
Chapter 3 Prepare to Think Statistically 21
Ask Questions 22
There Is Variation in All Things 23
Scenario: Customer Perception (The Sequel) 24
Case Study: Kidney-Cancer Rates 26
Probabilities and Statistics 28
Probability vs. Intuition 29
Discovery with Statistics 31
Chapter Summary 33
Part Two Speaking Like a Data Head
Chapter 4 Argue with the Data 37
What Would You Do? 38
Missing Data Disaster 39
Tell Me the Data Origin Story 43
Who Collected the Data? 44
How Was the Data Collected? 44
Is the Data Representative? 45
Is There Sampling Bias? 46
What Did You Do with Outliers? 46
What Data Am I Not Seeing? 47
How Did You Deal with Missing Values? 47
Can the Data Measure What You Want It to Measure? 48
Argue with Data of All Sizes 48
Chapter Summary 49
Chapter 5 Explore the Data 51
Exploratory Data Analysis and You 52
Embracing the Exploratory Mindset 52
Questions to Guide You 53
The Setup 53
Can the Data Answer the Question? 54
Set Expectations and Use Common Sense 54
Do the Values Make Intuitive Sense? 54
Watch Out: Outliers and Missing Values 58
Did You Discover Any Relationships? 59
Understanding Correlation 59
Watch Out: Misinterpreting Correlation 60
Watch Out: Correlation Does Not Imply Causation 62
Did You Find New Opportunities in the Data? 63
Chapter Summary 63
Chapter 6 Examine the Probabilities 65
Take a Guess 66
The Rules of the Game 66
Notation 67
Conditional Probability and Independent Events 69
The Probability of Multiple Events 69
Two Things That Happen Together 69
One Thing or the Other 70
Probability Thought Exercise 72
Next Steps 73
Be Careful Assuming Independence 74
Don't Fall for the Gambler's Fallacy 74
All Probabilities Are Conditional 75
Don't Swap Dependencies 76
Bayes' Theorem 76
Ensure the Probabilities Have Meaning 79
Calibration 80
Rare Events Can, and Do, Happen 80
Chapter Summary 81
Chapter 7 Challenge the Statistics 83
Quick Lessons on Inference 83
Give Yourself Some Wiggle Room 84
More Data, More Evidence 84
Challenge the Status Quo 85
Evidence to the Contrary 86
Balance Decision Errors 88
The Process of Statistical Inference 89
The Questions You Should Ask to Challenge the Statistics 90
What Is the Context for These Statistics? 90
What Is the Sample Size? 91
What Are You Testing? 92
What Is the Null Hypothesis? 92
Assuming Equivalence 93
What Is the Significance Level? 93
How Many Tests Are You Doing? 94
Can I See the Confidence Intervals? 95
Is This Practically Significant? 96
Are You Assuming Causality? 96
Chapter Summary 97
Part Three Understanding the Data Scientist's Toolbox
Chapter 8 Search for Hidden Groups 101
Unsupervised Learning 102
Dimensionality Reduction 102
Creating Composite Features 103
Principal Component Analysis 105
Principal Components in Athletic Ability 105
PCA Summary 108
Potential Traps 109
Clustering 110
k-Means Clustering 111
Clustering Retail Locations 111
Potential Traps 113
Chapter Summary 114
Chapter 9 Understand the Regression Model 117
Supervised Learning 117
Linear Regression: What It Does 119
Least Squares Regression: Not Just a Clever Name 120
Linear Regression: What It Gives You 123
Extending to Many Features 124
Linear Regression: What Confusion It Causes 125
Omitted Variables 125
Multicollinearity 126
Data Leakage 127
Extrapolation Failures 128
Many Relationships Aren't Linear 128
Are You Explaining or Predicting? 128
Regression Performance 130
Other Regression Models 131
Chapter Summary 131
Chapter 10 Understand the Classification Model 133
Introduction to Classification 133
What You'll Learn 134
Classification Problem Setup 135
Logistic Regression 135
Logistic Regression: So What? 138
Decision Trees 139
Ensemble Methods 142
Random Forests 143
Gradient Boosted Trees 143
Interpretability of Ensemble Models 145
Watch Out for Pitfalls 145
Misapplication of the Problem 146
Data Leakage 146
Not Splitting Your Data 146
Choosing the Right Decision Threshold 147
Misunderstanding Accuracy 147
Confusion Matrices 148
Chapter Summary 150
Chapter 11 Understand Text Analytics 151
Expectations of Text Analytics 151
How Text Becomes Numbers 153
A Big Bag of Words 153
N-Grams 157
Word Embeddings 158
Topic Modeling 160
Text Classification 163
Naïve Bayes 164
Sentiment Analysis 166
Practical Considerations When Working with Text 167
Big Tech Has the Upper Hand 168
Chapter Summary 169
Chapter 12 Conceptualize Deep Learning 171
Neural Networks 172
How Are Neural Networks Like the Brain? 172
A Simple Neural Network 173
How a Neural Network Learns 174
A Slightly More Complex Neural Network 175
Applications of Deep Learning 178
The Benefits of Deep Learning 179
How Computers "See" Images 180
Convolutional Neural Networks 182
Deep Learning on Language and Sequences 183
Deep Learning in Practice 185
Do You Have Data? 185
Is Your Data Structured? 186
What Will the Network Look Like? 186
Artificial Intelligence and You 187
Big Tech Has the Upper Hand 188
Ethics in Deep Learning 189
Chapter Summary 190
Part Four Ensuring Success
Chapter 13 Watch Out for Pitfalls 193
Biases and Weird Phenomena in Data 194
Survivorship Bias 194
Regression to the Mean 195
Simpson's Paradox 195
Confirmation Bias 197
Effort Bias (aka the "Sunk Cost Fallacy") 197
Algorithmic Bias 198
Uncategorized Bias 198
The Big List of Pitfalls 199
Statistical and Machine Learning Pitfalls 199
Project Pitfalls 200
Chapter Summary 202
Chapter 14 Know the People and Personalities 203
Seven Scenes of Communication Breakdowns 204
The Postmortem 204
Storytime 205
The Telephone Game 206
Into the Weeds 206
The Reality Check 207
The Takeover 207
The Blowhard 208
Data Personalities 208
Data Enthusiasts 209
Data Cynics 209
Data Heads 209
Chapter Summary 210
Chapter 15 What's Next? 211
Index 215
- Milen Mahadevan, President of 84.51°
"What I love about this book is its remarkable breadth of topics covered, while maintaining a healthy depth in the content presented for each topic. I believe in the pedagogical concept of 'Talking the Walk,' which means being able to explain the hard stuff in terms that broad audiences can grasp. Too many data science books are either too specialized in taking you down the deep paths of mathematics and coding ('Walking the Walk') or too shallow in over-hyping the content with a plethora of shallow buzzwords ('Talking the Talk'). You can take a great walk down the pathways of the data field in Alex and Jordan's without fear of falling off the path. The journey and destination are well worth the trip, and the talk."
- Kirk Borne, Data Scientist and Top Worldwide Influencer in Data Science
"The most clear, concise, and practical characterization of working in corporate analytics that I've seen. If you want to be a killer analyst and ask the right questions, this is for you."
- Kristen Kehrer, Data Moves Me, LLC and LinkedIn Top Voices in Data Science & Analytics
"THE book that business and technology leaders need to read to fully understand the potential, power, AND limitations of data science."
- Jennifer L. L. Morgan, PhD, Analytical Chemist at Procter and Gamble
"You've heard it before: 'We need to be doing more machine learning. Why aren't we doing more sophisticated data science work?' Data science isn't the magic unicorn that will solve all of your company's problems. Becoming a Data Head brings this idea to life by highlighting when data science is (and isn't) the right approach and the common pitfalls to watch out for, explaining it all in a way that a data novice can understand. This book will be my new 'pocket reference' when communicating complicated concepts to non-technically trained leaders."
- Sandy Steiger, Director, Center for Analytics and Data Science at Miami University
"Individuals and organizations want to be data driven. They say they are data driven. Becoming a Data Head shows them how to actually become data driven, without the assumption of a statistics or data background. This book is for anyone, or any organization, asking how to bring a data mindset to the whole company, not just those trained in the space."
- Eric Weber, Head of Experimentation & Metrics Research, Yelp
"What is keeping data science from reaching its true potential? It is not slow algorithms, lack of data, lack of computing power, or even lack of data scientists. Becoming a Data Head tackles the biggest impediment to data science success, the communication gap between the data scientist and the executive. Gutman and Goldmeier provide creative explanations of data science techniques and how they are used with clear everyday relatable examples. Managers and executives, and anyone wanting to better understand data science will learn a lot from this book. Likewise, data scientists who find it challenging to explain what they are doing will also find great value in Becoming a Data Head."
- Jeffrey D. Camm, PhD, Center for Analytics Impact, Wake Forest University
"Becoming a Data Head raises the level of education and knowledge in an industry desperate for clarity in thinking. A must read for those working with and within the growing field of data science and analytics."
- Dr. Stephen Chambal, VP for Corporate Growth at Perduco (DoD Analytics Company)
"Gutman and Goldmeier filter through much of the noise to break down complex data and statistical concepts we hear today into basic examples and analogies that stick. Becoming a Data Head has enabled me to translate my team's data needs into more tangible business requirements that make sense for our organization. A great read if you want to communicate your data more effectively to drive your business and data science team forward!"
- Justin Maurer, Engineering and Data Science Manager at Google
"As an aerospace engineer with nearly 15 years experience, Becoming a Data Head made me aware of not only what I personally want to learn about data science, but also what I need to know professionally to operate in a data-rich environment. This book further discusses how to filter through often overused terms like artificial intelligence. This is a book for every mid-level program manager learning how to navigate the inevitable future of data science."
- Josh Keener, Aerospace Engineer and Program Manager
"A must read for an in-depth understanding of data science for senior executives."
- Cade Saie, Chief Data Officer
"Gutman and Goldmeier offer practical advice for asking the right questions, challenging assumptions, and avoiding common pitfalls. They strike a nice balance between thoroughly explaining concepts of data science while not getting lost in the weeds. This book is a useful addition to the toolbox of any analyst, data scientist, manager, executive, or anyone else who wants to become more comfortable with data science."
- Jeff Bialac, Senior Supply Chain Analyst at Kroger
"Gutman and Goldmeier have written a book that is as useful for applied statisticians and data scientists as it is for business leaders and technical professionals. In demystifying these complex statistical topics, they have also created a common language that bridges the longstanding communication divide that has -- until now -- separated data work from business value."
- Kathleen Maley, Chief Analytics Officer at datazuum
JORDAN GOLDMEIER is a Data Scientist, author, speaker, and community leader. He is a seven-time recipient of the Microsoft Most Valuable Professional Award and he has taught analytics to members of the Pentagon and Fortune 500 companies.