Investigation 1.1: Friend or Foe?

Section 3.1 Investigation 1.1: Friend or Foe?

In this first investigation you will be introduced to a basic statistical investigation as well as some ideas and terminology that you will utilize throughout the course. You will combine ideas from the preliminary investigations: examining distributions of data and simulating models of random processes to help judge how unusual an observation would be for a particular probability model

🔗

Exercises 3.1.1 The Study

In a study reported in the November 2007 issue of Nature, researchers investigated whether infants take into account an individual’s actions towards others in evaluating that individual as appealing or aversive, perhaps laying the foundation for social interaction (Hamlin, Wynn, and Bloom, 2007). In other words, do children who aren’t even yet talking still form impressions as to someone’s friendliness based on their actions?

🔗

In one component of the study, sixteen 10-month-old infants were shown a “climber” character (a piece of wood with “googly” eyes glued onto it) that could not make it up a hill in two tries.

🔗

Then the infants were shown two scenarios for the climber’s next try, one where the climber was pushed to the top of the hill by another character (the “helper” toy) and one where the climber was pushed back down the hill by another character (the “hinderer” toy). The infant was alternately shown these two scenarios several times.

🔗

Then the child was presented with both pieces of wood (the helper and the hinderer characters) and asked to pick one to play with.

🔗

Infant choosing between helper and hinderer toys

Aside: Video Demonstrations.

Collecting the Data.

Definitions: Sample and Sample Size.

A sample is a collection of observed outcomes generated by repeated realizations of a random process. The set of observations should reflect the typical behavior, and the variability inherent in that process. A study’s sample size is the number of outcomes observed.

🔗

1. Identify the Sample.

Identify the sample in this study.

🔗

Hint.

The sample consists of the observational units from which data were collected.

🔗

Solution.

The sample is the observations from the 16 infants.

🔗

2. Assess Independence Assumption.

Do you think it is reasonable to model these observations as independent realizations of the random process under “identical conditions”? Explain.

🔗

Hint.

Consider whether the infants can be viewed as interchangeable and whether each infant’s choice was measured separately.

🔗

Solution.

Opinions will vary, but if we consider the infants as interchangeable (no differences between them) and the infants’ choices were all measured separately, than this modeling assumption seems appropriate. In particular, we need to be willing to model each infant as having the same probability of picking the helper toy.

🔗

3. Explain Experimental Controls.

Why is it important that the researchers varied the colors and shapes of the wooden characters and even on which side the toys were presented to the infants?

🔗

Hint.

Think about other factors besides the helping/hindering behavior that might influence an infant’s choice.

🔗

Solution.

This is to control for (or at least balance out) any other factors that could be influencing the infants’ choices.

🔗

Definition: Variables.

The measurements we are taking define the variable. We classify the type of variable as categorical (assigning each observational unit to a category) or quantitative (assigning each observational unit a numerical measurement). A special type of categorical variable is a binary variable, which has just two possible outcomes (often labeled “success” and “failure”).

🔗

4. Identify the Variable.

What is the variable we are measuring about each observation?

🔗

Hint.

Think about what information is recorded for each infant.

🔗

Solution.

Variable = which toy does the infant choose to play with.

🔗

5. Classify Variable Type.

Is this variable quantitative or categorical?

🔗

Hint.

Think about whether the data is a number or a category.

🔗

Categorical
Correct! The variable records which category (Helper or Hinderer) each infant chose, making it a categorical variable.
Quantitative
Not quite. A quantitative variable would be a numerical measurement. Here we’re recording which toy was chosen, which is a category.

🔗

Definition: Research Question.

A research question often looks for patterns in a variable or compares a variable across different groups or looks for a relationship between variables.

🔗

6. State Research Question.

What research question is of interest here?

🔗

Hint.

What question are the researchers trying to answer about infant behavior?

🔗

Solution.

The research question is whether infants in general (assuming identical infants from a random process) are more likely to pick the helper toy than the hinderer toy in the long run.

🔗

Summarizing the Observed Data.

To summarize the distribution of a categorical variable, we can simply count how many are in each category and make a bar graph to display the results, one bar for each outcome, with heights representing the number of observations in each category, separating the bars to indicate distinct categories.

🔗

7. Create Bar Graph.

The “raw data” can be found on the course webpage as a txt file (InfantData.txt). How many do you see of each possible outcome? Sketch the bar graph. Give your graph an “active title,” a concise sentence stating the main message/key takeaways from the graph. What is your title?

🔗

Hint.

Count how many infants chose each toy. A bar graph should have bars for each category (Helper and Hinderer) with heights representing the counts.

🔗

Solution.

14/16 = 0.875

🔗

Active title: A majority of infants preferred the helper toy.

Bar graph showing 14 infants chose Helper toy and 2 chose Hinderer toy

🔗

8. Create Bar Graph with Technology.

Use technology (R or JMP) to create a bar graph of these data. Choose one set of instructions below by clicking on a hint below. See also Using Technology with This Book.

🔗

Hint 1. R Instructions

Load data from GitHub and create bar graph

🔗

Click the "Evaluate (R)" button to run the code.

🔗

Hint 2. JMP Instructions

Choose File > New > Data table. Open the InfantData.txt (raw data) link and select all the observations and the variable name (e.g., ctrl-A) and copy into the clipboard (e.g., ctrl-C).

🔗

🔗

Return to the data table in JMP and select Edit > Paste with Column Names.

🔗
Choose Analyze > Tabulate
🔗

🔗
Drag the choice column to the Drop Zone for rows
🔗

🔗
Use the red triangle ("hot spot") menu and select Show Chart
🔗

🔗

🔗

JMP Tabulate window showing how to drag the variable

JMP Tabulate window with Show Chart option

🔗

Solution.

JMP "thumbnail" graph:

🔗

A bar chart for the infant data showing 14 Helper and 2 Hinderer choices

🔗

Drawing Conclusions Beyond the Sample.

Clearly a majority/more than half of the infants chose the helper toy in this sample of 16 infants. But does that convince us that infants in general are more likely to pick the helper toy in the long run? In other words, what is the probability that an infant will choose the helper toy?

🔗

Model assumption: Note we are assuming each infant has the same probability of picking the helper toy, we just don’t know the value of that probability.

🔗

9. Researchers’ Hypothesis.

What do the researchers think is true about the value of this probability (e.g., do they think it is larger or smaller than 0.50)?

🔗

Hint.

Consider what the research hypothesis is about infant preferences for the helper toy.

🔗

Larger than 0.50
Correct! The researchers hypothesize that infants prefer the helper toy, which would mean the probability of choosing the helper is greater than 0.50.
Smaller than 0.50
Not quite. If infants preferred the hinderer toy, the probability would be less than 0.50, but that’s not what the researchers think.
Equal to 0.50
Not quite. A probability of 0.50 would mean infants choose equally between the two toys, which is not the researchers’ hypothesis.

🔗

10. Consider Chance Explanation.

Is it possible that in the long run infants just choose equally between the two toys (e.g., the probability an infant will choose the helper toy is 0.5) and we just happened to see more than half choose the helper toy in our sample?

🔗

Yes
Correct! It’s possible that the true probability is 0.50 and we just observed an unusual sample by chance.
No
Actually, it is possible. Random samples can vary, and we might see more than half choose the helper toy even if the true probability is 0.50.

🔗

11. Rule Out Color Preference.

Is it plausible that the observed majority occurred because infants just prefer the color blue?

🔗

Hint.

Recall why the researchers varied colors, shapes, and positions.

🔗

Yes
Not quite. The researchers varied the colors, shapes, and positions of the toys to balance out these factors, so color preference is not a plausible explanation.
No
Correct! We are not considering color, shape, or position as the explanation because these factors were balanced in the design of the study.

🔗

So that leaves us with two explanations for the majority we observed:

🔗

There is something to the theory that infants are genuinely more likely to pick the helper toy (for some reason).
🔗

🔗
Infants choose equally between the two toys in the long run and we happened to get “lucky” and had an unusual sample where most of the infants in our sample picking the helper toy.
🔗

🔗

🔗

12. Choose Between Explanations.

So for the two possibilities we are still considering, how might you choose between them? In particular, how might you convince someone whether or not option (2) is plausible based on this study?

🔗

Hint.

Think about what makes an outcome unusual or typical when choices are made randomly.

🔗

Solution.

We would need to convince someone that if these results were just happening "randomly," it would be unusual to get 14 infants picking the helper toy.

🔗

Our analysis approach is going to be to assume the second explanation is true (similar to how in a legal trial we assume a defendant is innocent), and then see whether our data are consistent or inconsistent with that assumption. To do this, we need to investigate the values we expect to see for the number choosing the helper toy when 16 infants are equally choosing between the two toys. As you saw with the Random Babies (Investigation B), we can simulate the outcomes of a random process to help us determine which outcomes are more or less likely to occur.

🔗

13. Design a Simulation.

Suggest a method for carrying out a simulation of 16 infants picking equally between the two toys.

🔗

Hint.

Think about a simple physical randomization device that gives two equally likely outcomes.

🔗

Solution.

We could toss a coin for each infant, letting heads represent choosing the helper toy and tails represent choosing the hinderer toy. This makes the two choices equally likely on each toss. Then use 16 coins or toss one coin 16 times to represent the 16 infants. (We are assuming these are equivalent, that the observational units are identical.) These results will help us assess the variability in the outcomes of 16 infants "just by chance." This will help us decide whether 14 is a typical outcome or an unusual outcome when we know for a fact that the "infants" choose equally (in the long run) between the two toys.

🔗

Simulation.

For a 50-50 simulation model, we can flip a fair coin. We can arbitrarily define “heads” to be choosing the helper toy and “tails” to be choosing the hinderer toy. We will repeat the random process 16 times to represent the 16 infants, and we will count how many times we flip heads, representing an infant choosing the helper toy. The chart below shows this mapping of the real world, which we saw one instance of, and the simulation model, which we can easily repeat many times. Keep in mind that in the simulation model, we know the probability of heads is 0.50.

🔗

Mapping real world to simulation model

🔗

Element	Real world	Simulation model
One observation	Infant choice	Coin toss
Sample	16 infants	16 coin tosses
Success	Picks helper toy	Lands heads
Probability of “success”	Unknown	0.50

14. Conduct Coin Toss Simulation.

Flip a coin 16 times, representing the 16 infants in the study (one repetition of this random process). Tally the results below and count how many of the 16 chose the helper toy:

🔗

“Could have been” outcomes

🔗

Heads (helper toy):

🔗

Tails (hinderer toy):

🔗

Total number of heads in 16 tosses:

🔗

Hint.

Make sure you numbers of heads and tails sum to 16.

🔗

15. Combine Class Results.

Combine your simulation results for each repetition with your classmates’ on the scale below. Create a dotplot by placing a dot above the numerical result found by each person’s set of 16 tosses.

🔗

Hint.

Each person in class should contribute one dot to the class dotplot, placed above their number of heads out of 16.

🔗

Solution.

Results will vary by class. Below is one possible set of results:

🔗

Dotplot showing distribution of class simulation results for number of heads in 16 coin tosses

🔗

16. Describe Simulation Variability.

Did everyone get the same number of heads every time? What is an average or typical number of heads in a set of 16 tosses? Is this what you expected? Explain.

🔗

Hint.

Look at the center of the dotplot. What value appears most frequently or is in the middle of the distribution?

🔗

Solution.

No, there will be variability across the sets of 16 tosses, but 8 heads is an average or typical number of heads.

🔗

17. Assess Unusualness.

Does 14 heads appear to be an unusual outcome for 16 observations from a process where heads should appear 50% of the time in the long run?

🔗

Hint.

Look at your class dotplot. How often did values as extreme as 14 (or more) occur? Is 14 in the "tail" or center of the distribution?

🔗

Solution.

Answers will vary, but 14 does appear to be somewhat unusual, not occurring very often, in the "tail" of the distribution.

🔗

We really need to simulate this hypothetical random selection process hundreds, preferably thousands of times. This would be very tedious and time-consuming with coins, so let’s turn to technology.

🔗

Simulate Using Technology

🔗

Use the One Proportion Inference applet to simulate these 16 infants making this helper/hinderer choice, still assuming that infants have no real preference and so are equally likely to choose either toy.

🔗

Keep the Probability of heads set to 0.5.

🔗
Set the Number of Tosses to 16.

🔗
Keep the Number of repetitions at 1 for now.

🔗
Press Draw Samples.

🔗

🔗

18. Report the number of heads.

Report the number of heads (i.e., the number of infants who choose the helper toy) for this “could have been” (under the assumption of no preference) outcome.

🔗

Number of heads:

🔗

Hint.

The applet will simulate flipping 16 coins and count the number of heads for you automatically.

🔗

In the applet, uncheck the Show animation box and press Draw Samples four more times, each time recording the number of the 16 infants who choose the helper toy.

🔗

19. Repeat Simulation Multiple Times.

Did you get the same number of heads all five times?

🔗

Hint.

Each repetition simulates a new set of 16 coin flips. Think about whether random processes produce identical results every time.

🔗

Yes
Actually, random processes typically produce different results each time. You should see variation in the number of heads across the five repetitions.
No
Correct! There should be variation in the results across the repetitions. This variability is a natural characteristic of random processes.

🔗

Now change the Number of repetitions to 1995 and press Draw Samples, to produce a total of 2,000 repetitions of this random process of tossing a coin 16 times.

🔗

20. Describe Distribution.

For the dotplot you have created, what does each dot represent (i.e., what would you need to do to add another dot to the graph)?

🔗

Hint.

Think about what you did to create one dot in the physical coin-flipping activity.

🔗

Solution.

Each dot in the dotplot represents the number of heads in 16 coin tosses (representing the choices of a set of 16 infants).

🔗

21. Classify Graph Variable.

Is the graph variable (number of heads) quantitative or categorical?

🔗

Hint.

What does the horizontal axis represent? Is it counting something or assigning categories?

🔗

Quantitative
Correct! The variable "number of heads" is quantitative because it represents a numerical count that can be measured and has meaningful numerical values.
Categorical
Not quite. The number of heads is a count (0, 1, 2, ..., 16), which makes it quantitative rather than categorical. Categorical variables assign observations to categories, not numerical values.

🔗

22. Draw Conclusion.

Now that we have a better picture of the long-run behavior of this process, discuss whether you would consider option 2 before Question 12: “Infants choose equally between the two toys in the long run and we happened to get ’lucky’ and find most of the infants in our sample picking the helper toy” to be a plausible conclusion for this study. Explain your reasoning as if to a skeptic.

🔗

Hint.

Look at how often 14 or more heads occurred in your 2,000 simulated repetitions. Is this common or rare? What does this tell you about the plausibility of the "no preference" assumption?

🔗

Solution.

Because it is very unlikely for us to have seen results at least as extreme as what we observed (14 successes) under the assumption of 50-50 chance, we have evidence against this claim and instead in favor of the claim that there is something other than random chance at play in this sample.

🔗

Discussion.

Returning to our legal trial analogy, if you decide that the observed “data” is unlikely to occur by chance alone, you are going to “reject” the assumption of “innocence” (and say we have evidence the defendant is guilty). If you decide the data/evidence is not unusual by chance alone, then you “fail to reject” that assumption (and we say we don’t have evidence the defendant is guilty — we aren’t proving the defendant innocent, just that the evidence is not inconsistent with that assumption, “not guilty”).

🔗

So based on these simulation results, we would say the data (14 helper choices out of 16 trials) is unusual under the assumption that infants genuinely have no preference and are choosing blindly when presenting the toys. This evidence convinces us that “There is something to the theory that infants are genuinely more likely to pick the helper toy (for some reason)” is the more believable explanation for why so many of the infants in this study picked the helper toy over the hinderer toy. We haven’t proven this is true, but based on the strong majority these researchers saw, even for this small sample size of 16, we would consider the evidence convincing (“beyond a reasonable doubt”) that, in the long run, the probability of choosing the helper toy in this random process is greater than 0.5. Because the researchers controlled for other possible explanations for the observed preference results like color and handedness, we will conclude that there is convincing evidence that infants really do have a genuine preference for the helper toy over the hindering toy.

🔗

Study Conclusions.

In a study of “social evaluation,” researchers explored whether pre-verbal infants have a preference for a “helping” toy over a “hindering” toy. Treating the 16 infants as identical observations from a random process with equal probability of success/failure, we find that getting 14 infants choosing the helper toy is not consistent with the types of values we expect to see when we have “infants” choosing equally between the two toys. This means that the researchers’ data provide strong statistical evidence to reject this “no preference” model and conclude that the infants’ choices are actually governed by a process where there is a genuine preference for the helper toy (or at least that it’s more complicated than each infant flipping a coin to decide). Of course, this conclusion depends on the assumption of “identical infants” and that these 16 infants’ choices are representative of the larger process of viewing the videos and selecting a toy. Also keep in mind that not all infants had a clear preference for either object.

🔗

Subsection 3.1.2 Practice Problem 1.1A

In a second experiment, the same events were repeated but the object climbing the hill no longer had the googly eyes attached. The researchers wanted to see whether the preference was made based on a social evaluation more than a perceptual preference. Suppose 8 of 12 (different) infants chose the push-up toy.

🔗

Checkpoint 3.1.23. Determine Sample Size for Simulation.

If you were to use a coin to carry out a simulation analysis to evaluate these results: how many times would you flip the coin for one repetition — 6, 8, 10, 12, 16, or 1000?

🔗

Hint.

How many infants participated in this second experiment? Each coin flip represents one infant’s choice.

🔗

6
Not quite. Think about how many infants were in this experiment.
8
Close, but this is the number who chose the push-up toy, not the total number of infants.
10
Not quite. Check the problem statement for the total number of infants in the experiment.
12
Correct! You need to flip the coin 12 times to represent the 12 different infants in this experiment, just like we flipped 16 times to represent the 16 infants in the original study.
16
Not quite. That was the sample size in the original study, but this experiment has a different number of infants.
1000
Not quite. 1000 would be the number of repetitions we might do, not the number of coin flips per repetition.

🔗

Checkpoint 3.1.24. Evaluate Evidence Without Eyes.

Use the One Proportion Inference applet to decide whether it is plausible that when the googly eyes are removed infants do not have a genuine preference between the two toys. What do you conclude?

🔗

Subsection 3.1.3 Practice Problem 1.1B

In 2019, the home team won 54 of the first 88 games of the Premier Soccer League season. Consider these games as a sample from a random process (all games that could have occurred in first 3 months).

🔗

Checkpoint 3.1.25. Model the Soccer Process.

Could we use a coin tossing simulation to model this random process? What would each coin toss represent? What are we assuming about the process? How many times would we toss the coin for one repetition? Define what is meant by “probability of success” in this context.

🔗

Checkpoint 3.1.26. Test Home Field Advantage.

Use the One Proportion Inference applet to decide whether these data provide convincing statistical evidence that the home team is more likely than the visiting team to win in the long run. Justify your conclusion.

🔗

Checkpoint 3.1.27. Test Home Advantage Without Fans.

At the beginning of the 2020 season, fans were not allowed at the games due to the Coronavirus pandemic. For the first three months of this season, the home team won 40 of 87 matches. Decide whether these data provide convincing statistical evidence that the home team is more likely than the visiting team to win when no fans are present. Justify your conclusion.

🔗

You have attempted of activities on this page.

🔗

Prev Top Next