Module A Preface¶
Even if you have never seen or done any statistics or computational science before this class (or even if you don’t know what those words mean), you have definitely encountered them in your daily life.
Maybe you’ve heard of Nate Silver and his blog FiveThirtyEight (named after the number of electors in the US electoral college), who predicted 49 of the 50 states correctly in the 2008 federal election and 50 of the 50 states correctly in the 2012 federal election. Maybe you’ve read the book Hidden Figures or seen the movie with the same name, a true story that follows Katherine Johnson, Mary Jackson, and Dorothy Vaughan, three groundbreaking engineers and mathematicians in their careers at NASA at a time when NASA was racially segregated and gender discrimination was pervasive.
Or maybe you’ve read about how pathologists are using machine learning to detect cancer in real-time, or how Deepmind is using neural networks to train computers to beat the best Go players in the world. Maybe you’ve seen Moneyball or read the non-fiction book of the same name, in which Billy Beane, the general manager of the Major League Baseball Oakland A’s team, uses “sabermetrics” (essentially just statistics applied to baseball) to assemble a division-winning team with low budget and no experienced players.
All of these examples fall within the realm of data science, which encompasses statistics, decision theory, machine learning, artificial intelligence, and so much more. But data science doesn’t have to be as complicated as sending astronauts to the moon or predicting elections! It can be as simple as looking up the weather before deciding what to wear, or checking the price of a t-shirt at multiple shops before deciding where to buy it, or reading some reviews of a hit new movie before deciding whether or not to see it.
The more informed you are about data, the better you can use it to your advantage. Only by understanding how to use it can you find ways to fix flaws that exist in data collection and usage today. For some examples of problems with data today, check out this article. These, and the preceding examples, are all examples of using data to inform predictions or decisions. The purpose of this class is to give you the tools to help you use data to make more informed decisions, whether that be in your classes, at work, or even day-to-day life.
In this course, you will learn the following key skills.
Understand the kinds of questions data science can answer, and learn to formulate and answer your own research questions using a data-driven approach.
Generate your own data through surveys and import/export data to/from databases.
Interpret, assess, and handle data, including learning to ethically gather and manage data in terms of bias, privacy.
Understand the basics of using Google Sheets to manipulate data, including sorting, refining, and grouping your data.
Visualize your data effectively and summarize your findings in different forms of reports.
This course is designed to prepare you for further studies and for future careers. Since data is present in all fields of study and careers, the skills in this textbook prepare you for using and understanding data in any capacity.