Skip to main content

Section 3.1 Investigation 1.1: Friend or Foe?

In this first investigation you will be introduced to a basic statistical investigation as well as some ideas and terminology that you will utilize throughout the course. You will combine ideas from the preliminary investigations: examining distributions of data and simulating models of random processes to help judge how unusual an observation would be for a particular probability model

Exercises 3.1.1 The Study

In a study reported in the November 2007 issue of Nature, researchers investigated whether infants take into account an individual’s actions towards others in evaluating that individual as appealing or aversive, perhaps laying the foundation for social interaction (Hamlin, Wynn, and Bloom, 2007). In other words, do children who aren’t even yet talking still form impressions as to someone’s friendliness based on their actions?
In one component of the study, sixteen 10-month-old infants were shown a β€œclimber” character (a piece of wood with β€œgoogly” eyes glued onto it) that could not make it up a hill in two tries.
Then the infants were shown two scenarios for the climber’s next try, one where the climber was pushed to the top of the hill by another character (the β€œhelper” toy) and one where the climber was pushed back down the hill by another character (the β€œhinderer” toy). The infant was alternately shown these two scenarios several times.
Then the child was presented with both pieces of wood (the helper and the hinderer characters) and asked to pick one to play with.
Infant choosing between helper and hinderer toys

Aside: Video Demonstrations.

Collecting the Data.

Definitions: Sample and Sample Size.
A sample is a collection of observed outcomes generated by repeated realizations of a random process. The set of observations should reflect the typical behavior, and the variability inherent in that process. A study’s sample size is the number of outcomes observed.
1. Identify the Sample.
Identify the sample in this study.
Hint.
The sample consists of the observational units from which data were collected.
Solution.
The sample is the observations from the 16 infants.
2. Assess Independence Assumption.
Do you think it is reasonable to model these observations as independent realizations of the random process under β€œidentical conditions”? Explain.
Hint.
Consider whether the infants can be viewed as interchangeable and whether each infant’s choice was measured separately.
Solution.
Opinions will vary, but if we consider the infants as interchangeable (no differences between them) and the infants’ choices were all measured separately, than this modeling assumption seems appropriate. In particular, we need to be willing to model each infant as having the same probability of picking the helper toy.
3. Explain Experimental Controls.
Why is it important that the researchers varied the colors and shapes of the wooden characters and even on which side the toys were presented to the infants?
Hint.
Think about other factors besides the helping/hindering behavior that might influence an infant’s choice.
Solution.
This is to control for (or at least balance out) any other factors that could be influencing the infants’ choices.
Definition: Variables.
The measurements we are taking define the variable. We classify the type of variable as categorical (assigning each observational unit to a category) or quantitative (assigning each observational unit a numerical measurement). A special type of categorical variable is a binary variable, which has just two possible outcomes (often labeled β€œsuccess” and β€œfailure”).
4. Identify the Variable.
What is the variable we are measuring about each observation?
Hint.
Think about what information is recorded for each infant.
Solution.
Variable = which toy does the infant choose to play with.
5. Classify Variable Type.
Is this variable quantitative or categorical?
Hint.
Think about whether the data is a number or a category.
  • Categorical
  • Correct! The variable records which category (Helper or Hinderer) each infant chose, making it a categorical variable.
  • Quantitative
  • Not quite. A quantitative variable would be a numerical measurement. Here we’re recording which toy was chosen, which is a category.
Definition: Research Question.
A research question often looks for patterns in a variable or compares a variable across different groups or looks for a relationship between variables.
6. State Research Question.
What research question is of interest here?
Hint.
What question are the researchers trying to answer about infant behavior?
Solution.
The research question is whether infants in general (assuming identical infants from a random process) are more likely to pick the helper toy than the hinderer toy in the long run.

Summarizing the Observed Data.

To summarize the distribution of a categorical variable, we can simply count how many are in each category and make a bar graph to display the results, one bar for each outcome, with heights representing the number of observations in each category, separating the bars to indicate distinct categories.
7. Create Bar Graph.
The β€œraw data” can be found on the course webpage as a txt file (InfantData.txt). How many do you see of each possible outcome? Sketch the bar graph. Give your graph an β€œactive title,” a concise sentence stating the main message/key takeaways from the graph. What is your title?
Hint.
Count how many infants chose each toy. A bar graph should have bars for each category (Helper and Hinderer) with heights representing the counts.
Solution.
14/16 = 0.875
Active title: A majority of infants preferred the helper toy.
Bar graph showing 14 infants chose Helper toy and 2 chose Hinderer toy
8. Create Bar Graph with Technology.
Use technology (R or JMP) to create a bar graph of these data. Choose one set of instructions below by clicking on a hint below. See also Using Technology with This Book.
Hint 1. R Instructions
Load data from GitHub and create bar graph
Click the "Evaluate (R)" button to run the code.
Hint 2. JMP Instructions
  • Choose File > New > Data table. Open the InfantData.txt (raw data) link and select all the observations and the variable name (e.g., ctrl-A) and copy into the clipboard (e.g., ctrl-C).
JMP Tabulate window showing how to drag the variable
JMP Tabulate window with Show Chart option
Solution.
JMP "thumbnail" graph:
A bar chart for the infant data showing 14 Helper and 2 Hinderer choices

Drawing Conclusions Beyond the Sample.

Clearly a majority/more than half of the infants chose the helper toy in this sample of 16 infants. But does that convince us that infants in general are more likely to pick the helper toy in the long run? In other words, what is the probability that an infant will choose the helper toy?
Model assumption: Note we are assuming each infant has the same probability of picking the helper toy, we just don’t know the value of that probability.
9. Researchers’ Hypothesis.
What do the researchers think is true about the value of this probability (e.g., do they think it is larger or smaller than 0.50)?
Hint.
Consider what the research hypothesis is about infant preferences for the helper toy.
  • Larger than 0.50
  • Correct! The researchers hypothesize that infants prefer the helper toy, which would mean the probability of choosing the helper is greater than 0.50.
  • Smaller than 0.50
  • Not quite. If infants preferred the hinderer toy, the probability would be less than 0.50, but that’s not what the researchers think.
  • Equal to 0.50
  • Not quite. A probability of 0.50 would mean infants choose equally between the two toys, which is not the researchers’ hypothesis.
10. Consider Chance Explanation.
Is it possible that in the long run infants just choose equally between the two toys (e.g., the probability an infant will choose the helper toy is 0.5) and we just happened to see more than half choose the helper toy in our sample?
  • Yes
  • Correct! It’s possible that the true probability is 0.50 and we just observed an unusual sample by chance.
  • No
  • Actually, it is possible. Random samples can vary, and we might see more than half choose the helper toy even if the true probability is 0.50.
11. Rule Out Color Preference.
Is it plausible that the observed majority occurred because infants just prefer the color blue?
Hint.
Recall why the researchers varied colors, shapes, and positions.
  • Yes
  • Not quite. The researchers varied the colors, shapes, and positions of the toys to balance out these factors, so color preference is not a plausible explanation.
  • No
  • Correct! We are not considering color, shape, or position as the explanation because these factors were balanced in the design of the study.
So that leaves us with two explanations for the majority we observed:
  1. There is something to the theory that infants are genuinely more likely to pick the helper toy (for some reason).
  2. Infants choose equally between the two toys in the long run and we happened to get β€œlucky” and had an unusual sample where most of the infants in our sample picking the helper toy.
12. Choose Between Explanations.
So for the two possibilities we are still considering, how might you choose between them? In particular, how might you convince someone whether or not option (2) is plausible based on this study?
Hint.
Think about what makes an outcome unusual or typical when choices are made randomly.
Solution.
We would need to convince someone that if these results were just happening "randomly," it would be unusual to get 14 infants picking the helper toy.
Our analysis approach is going to be to assume the second explanation is true (similar to how in a legal trial we assume a defendant is innocent), and then see whether our data are consistent or inconsistent with that assumption. To do this, we need to investigate the values we expect to see for the number choosing the helper toy when 16 infants are equally choosing between the two toys. As you saw with the Random Babies (Investigation B), we can simulate the outcomes of a random process to help us determine which outcomes are more or less likely to occur.
13. Design a Simulation.
Suggest a method for carrying out a simulation of 16 infants picking equally between the two toys.
Hint.
Think about a simple physical randomization device that gives two equally likely outcomes.
Solution.
We could toss a coin for each infant, letting heads represent choosing the helper toy and tails represent choosing the hinderer toy. This makes the two choices equally likely on each toss. Then use 16 coins or toss one coin 16 times to represent the 16 infants. (We are assuming these are equivalent, that the observational units are identical.) These results will help us assess the variability in the outcomes of 16 infants "just by chance." This will help us decide whether 14 is a typical outcome or an unusual outcome when we know for a fact that the "infants" choose equally (in the long run) between the two toys.

Simulation.

For a 50-50 simulation model, we can flip a fair coin. We can arbitrarily define β€œheads” to be choosing the helper toy and β€œtails” to be choosing the hinderer toy. We will repeat the random process 16 times to represent the 16 infants, and we will count how many times we flip heads, representing an infant choosing the helper toy. The chart below shows this mapping of the real world, which we saw one instance of, and the simulation model, which we can easily repeat many times. Keep in mind that in the simulation model, we know the probability of heads is 0.50.
Mapping real world to simulation model
Element Real world Simulation model
One observation Infant choice Coin toss
Sample 16 infants 16 coin tosses
Success Picks helper toy Lands heads
Probability of β€œsuccess” Unknown 0.50
14. Conduct Coin Toss Simulation.
Flip a coin 16 times, representing the 16 infants in the study (one repetition of this random process). Tally the results below and count how many of the 16 chose the helper toy:
β€œCould have been” outcomes
Heads (helper toy):
Tails (hinderer toy):
Total number of heads in 16 tosses:
Hint.
Make sure you numbers of heads and tails sum to 16.
15. Combine Class Results.
Combine your simulation results for each repetition with your classmates’ on the scale below. Create a dotplot by placing a dot above the numerical result found by each person’s set of 16 tosses.
Hint.
Each person in class should contribute one dot to the class dotplot, placed above their number of heads out of 16.
Solution.
Results will vary by class. Below is one possible set of results:
Dotplot showing distribution of class simulation results for number of heads in 16 coin tosses
16. Describe Simulation Variability.
Did everyone get the same number of heads every time? What is an average or typical number of heads in a set of 16 tosses? Is this what you expected? Explain.
Hint.
Look at the center of the dotplot. What value appears most frequently or is in the middle of the distribution?
Solution.
No, there will be variability across the sets of 16 tosses, but 8 heads is an average or typical number of heads.
17. Assess Unusualness.
Does 14 heads appear to be an unusual outcome for 16 observations from a process where heads should appear 50% of the time in the long run?
Hint.
Look at your class dotplot. How often did values as extreme as 14 (or more) occur? Is 14 in the "tail" or center of the distribution?
Solution.
Answers will vary, but 14 does appear to be somewhat unusual, not occurring very often, in the "tail" of the distribution.
We really need to simulate this hypothetical random selection process hundreds, preferably thousands of times. This would be very tedious and time-consuming with coins, so let’s turn to technology.
Simulate Using Technology
Use the One Proportion Inference applet to simulate these 16 infants making this helper/hinderer choice, still assuming that infants have no real preference and so are equally likely to choose either toy.
  • Keep the Probability of heads set to 0.5.
  • Set the Number of Tosses to 16.
  • Keep the Number of repetitions at 1 for now.
  • Press Draw Samples.
18. Report the number of heads.
Report the number of heads (i.e., the number of infants who choose the helper toy) for this β€œcould have been” (under the assumption of no preference) outcome.
Number of heads:
Hint.
The applet will simulate flipping 16 coins and count the number of heads for you automatically.
  • In the applet, uncheck the Show animation box and press Draw Samples four more times, each time recording the number of the 16 infants who choose the helper toy.
19. Repeat Simulation Multiple Times.
Did you get the same number of heads all five times?
Hint.
Each repetition simulates a new set of 16 coin flips. Think about whether random processes produce identical results every time.
  • Yes
  • Actually, random processes typically produce different results each time. You should see variation in the number of heads across the five repetitions.
  • No
  • Correct! There should be variation in the results across the repetitions. This variability is a natural characteristic of random processes.
  • Now change the Number of repetitions to 1995 and press Draw Samples, to produce a total of 2,000 repetitions of this random process of tossing a coin 16 times.
20. Describe Distribution.
For the dotplot you have created, what does each dot represent (i.e., what would you need to do to add another dot to the graph)?
Hint.
Think about what you did to create one dot in the physical coin-flipping activity.
Solution.
Each dot in the dotplot represents the number of heads in 16 coin tosses (representing the choices of a set of 16 infants).
21. Classify Graph Variable.
Is the graph variable (number of heads) quantitative or categorical?
Hint.
What does the horizontal axis represent? Is it counting something or assigning categories?
  • Quantitative
  • Correct! The variable "number of heads" is quantitative because it represents a numerical count that can be measured and has meaningful numerical values.
  • Categorical
  • Not quite. The number of heads is a count (0, 1, 2, ..., 16), which makes it quantitative rather than categorical. Categorical variables assign observations to categories, not numerical values.
22. Draw Conclusion.
Now that we have a better picture of the long-run behavior of this process, discuss whether you would consider option 2 before Question 12: β€œInfants choose equally between the two toys in the long run and we happened to get ’lucky’ and find most of the infants in our sample picking the helper toy” to be a plausible conclusion for this study. Explain your reasoning as if to a skeptic.
Hint.
Look at how often 14 or more heads occurred in your 2,000 simulated repetitions. Is this common or rare? What does this tell you about the plausibility of the "no preference" assumption?
Solution.
Because it is very unlikely for us to have seen results at least as extreme as what we observed (14 successes) under the assumption of 50-50 chance, we have evidence against this claim and instead in favor of the claim that there is something other than random chance at play in this sample.

Discussion.

Returning to our legal trial analogy, if you decide that the observed β€œdata” is unlikely to occur by chance alone, you are going to β€œreject” the assumption of β€œinnocence” (and say we have evidence the defendant is guilty). If you decide the data/evidence is not unusual by chance alone, then you β€œfail to reject” that assumption (and we say we don’t have evidence the defendant is guilty β€” we aren’t proving the defendant innocent, just that the evidence is not inconsistent with that assumption, β€œnot guilty”).
So based on these simulation results, we would say the data (14 helper choices out of 16 trials) is unusual under the assumption that infants genuinely have no preference and are choosing blindly when presenting the toys. This evidence convinces us that β€œThere is something to the theory that infants are genuinely more likely to pick the helper toy (for some reason)” is the more believable explanation for why so many of the infants in this study picked the helper toy over the hinderer toy. We haven’t proven this is true, but based on the strong majority these researchers saw, even for this small sample size of 16, we would consider the evidence convincing (β€œbeyond a reasonable doubt”) that, in the long run, the probability of choosing the helper toy in this random process is greater than 0.5. Because the researchers controlled for other possible explanations for the observed preference results like color and handedness, we will conclude that there is convincing evidence that infants really do have a genuine preference for the helper toy over the hindering toy.

Study Conclusions.

In a study of β€œsocial evaluation,” researchers explored whether pre-verbal infants have a preference for a β€œhelping” toy over a β€œhindering” toy. Treating the 16 infants as identical observations from a random process with equal probability of success/failure, we find that getting 14 infants choosing the helper toy is not consistent with the types of values we expect to see when we have β€œinfants” choosing equally between the two toys. This means that the researchers’ data provide strong statistical evidence to reject this β€œno preference” model and conclude that the infants’ choices are actually governed by a process where there is a genuine preference for the helper toy (or at least that it’s more complicated than each infant flipping a coin to decide). Of course, this conclusion depends on the assumption of β€œidentical infants” and that these 16 infants’ choices are representative of the larger process of viewing the videos and selecting a toy. Also keep in mind that not all infants had a clear preference for either object.

Subsection 3.1.2 Practice Problem 1.1A

In a second experiment, the same events were repeated but the object climbing the hill no longer had the googly eyes attached. The researchers wanted to see whether the preference was made based on a social evaluation more than a perceptual preference. Suppose 8 of 12 (different) infants chose the push-up toy.

Checkpoint 3.1.23. Determine Sample Size for Simulation.

If you were to use a coin to carry out a simulation analysis to evaluate these results: how many times would you flip the coin for one repetition β€” 6, 8, 10, 12, 16, or 1000?
Hint.
How many infants participated in this second experiment? Each coin flip represents one infant’s choice.
  • 6
  • Not quite. Think about how many infants were in this experiment.
  • 8
  • Close, but this is the number who chose the push-up toy, not the total number of infants.
  • 10
  • Not quite. Check the problem statement for the total number of infants in the experiment.
  • 12
  • Correct! You need to flip the coin 12 times to represent the 12 different infants in this experiment, just like we flipped 16 times to represent the 16 infants in the original study.
  • 16
  • Not quite. That was the sample size in the original study, but this experiment has a different number of infants.
  • 1000
  • Not quite. 1000 would be the number of repetitions we might do, not the number of coin flips per repetition.

Checkpoint 3.1.24. Evaluate Evidence Without Eyes.

Subsection 3.1.3 Practice Problem 1.1B

In 2019, the home team won 54 of the first 88 games of the Premier Soccer League season. Consider these games as a sample from a random process (all games that could have occurred in first 3 months).

Checkpoint 3.1.25. Model the Soccer Process.

Could we use a coin tossing simulation to model this random process? What would each coin toss represent? What are we assuming about the process? How many times would we toss the coin for one repetition? Define what is meant by β€œprobability of success” in this context.

Checkpoint 3.1.26. Test Home Field Advantage.

Checkpoint 3.1.27. Test Home Advantage Without Fans.

At the beginning of the 2020 season, fans were not allowed at the games due to the Coronavirus pandemic. For the first three months of this season, the home team won 40 of 87 matches. Decide whether these data provide convincing statistical evidence that the home team is more likely than the visiting team to win when no fans are present. Justify your conclusion.
You have attempted of activities on this page.