Skip to main content

Section 2.5 Test Yourself: Identifying Data Problems

Checkpoint 2.5.1.

According to the text, what must a data scientist do to successfully identify a data problem in a specific domain like farming?
  • Sit in front of a computer all day and work with a program like R.
  • No, the text explicitly states that this impression is a mistake.
  • Focus only on reading books and watching videos about the domain.
  • No, the text says the best way to gain domain knowledge is to ask subject matter experts.
  • Immediately begin verifying stories told by professionals.
  • No, the first step is to get the professional to tell the stories; verification comes later.
  • Become immersed in the problem domain to think like a subject matter expert.
  • Correct! The text emphasizes that a data scientist must learn to think like a farmer to identify a farmer’s data problems.

Checkpoint 2.5.2.

What is the first step in identifying an anomaly?
  • Look for the exception cases that are far from the center.
  • No, this is what you do after you have defined the center or typical case.
  • Focus attention on a small grouping of trees that lost more fruit.
  • No, this is an example of a potential anomaly, not the first step in the process of identifying one.
  • Define the central or most typical occurrences.
  • Right! You first need a yardstick for what is "typical" before you can identify what is "unusual".
  • Perform a systematic count of lost fruit under a random sample of trees.
  • No, this is a method used to establish what is typical, but defining "typical" is the actual first step.
You have attempted of activities on this page.