Suppose you, like the 17th-century French nobleman Chevalier de Mere, liked gambling on the outcome of rolling fair 6-sided dice (each numbered 1 to 6). Would you bet him that he couldnβt roll at least one 6 in four rolls of a single die? What about betting that he couldnβt roll at least one double-6 in 24 rolls of both dice?
Here is a python script that can help you get a feel for the questions above. You can switch between the two questions by commenting and uncommenting out the appropriate lines (lines that start with a # are comments). See how lucky you are!
We can get a feel for probability empirically by observing how frequently events occur when an experiment is repeated many times. It often happens, as it did with the Chevalier de Mere, that our intuition about probability is not quite right. Using the counting techniques we have studied, we can explain why our intuition is off and what the true probabilities are.
Most of the questions about counting we have considered in this chapter can also be asked as a question about probability. For example: How many passwords of length 8 can you make using just lower-case letters? What is the probability that randomly selecting 8 lower-case letters will give you your password?
While the subject of probability is vast and complex, the basics of discrete probability are little more than counting. So here we will take a brief look at how our study of counting can help us understand probability.
Assume that all days are equally likely and that nobody was born on February 29th. Would you believe the answer is more than 25%? More than 50%? More than 70%??? Letβs find the answer.
Among the 30 people, either they all have different birthdays or at least two share a birthday. Since this is certain, its probability is 1. So what is the probability that at least two people (out of the 30) share a birthday?
Think about how we use the language of probability in our everyday lives. We might say that tossing a coin has a 50% chance of coming up heads. Or that when rolling two dice, having the sum of the dice result in a 7 is more likely than having the sum be a 2. Casinos certainly rely on certain pairs of cards being consistently more likely than others when setting payouts for Blackjack. All of this assumes that there is some randomness to events, and that even in this randomness, there is some consistency to what can happen. We will assume this model of reality.
The things we can assign probabilities to are called random experiments. These can have different possible outcomes. We will call the (finite) set of possible outcomes to a random experiment the sample space (we will usually denote this set as ). By definition, performing a random experiment will always result in exactly one outcome from the sample space.
Throughout this section, we will always assume the uniform probability distribution, which means that we insist that each outcome in the sample space is equally likely. Then the probability of any particular outcome in the sample space is exactly .
The uniform probability distribution is a common and reasonable assumption to make, but it does preclude us from asking some questions. For example, throwing a dart at a dartboard is not uniformly distributed, and similarly, rolling weighted dice would not be. What is the probability that a thumbtack lands point up? But how would we even start to answer these questions? We would have to make some assumptions about what the probabilities of the outcomes actually are (perhaps via some repeated experiments).
The same space is the set of all possible outcomes of the experiment, which in this case is the set . The probability of getting two heads is then . In fact, every outcome has probability since there are 4 outcomes in the sample space.
Finding probabilities of outcomes really is this easy. Where things get more fun is if we look for the probability of an event: a subset of the sample space. For a particular random experiment, there might be lots of different events we ask about, and they do not need to be mutually exclusive. An event can also be a set containing just a single outcome or might contain no outcomes.
For example, suppose you roll a fair 6-side die. The sample space contains six outcomes . Some events we might care about include rolling an even number (the subset ), rolling a number less than (the set ), or rolling a number less than (the subset ). In fact, we now know that there are exactly different events we could ask about, since there are subsets of the sample space.
What does our intuition suggest about the example events described above? Rolling an even number should be just as likely as rolling an odd number, so we hope that the probability of rolling an even number is . Similarly, the probability of rolling a number less than should be since a third of the possible outcomes are less than 3. What about rolling a number less than ? Well, this must happen, so it would be , which as a fraction is just .
Suppose a random experiment has sample space . The probability of an event is the number of outcomes in divided by the number of outcomes in . We write this as .
We have spent a lot of effort learning how to count the size of sets. We can then use this to compute probabilities by counting the size of the sample space (set) and the size of the event (set).
First, letβs count the sample space, which will consist of all 5-card hands. The order of the cards in a hand is not important, so we will just count 5-element subsets of the 52 cards. The sample space therefore contains elements. (This number is just under 2.6 million: to be exact.)
Now, how many of those will be 4-of-a-kind? One way we could count this would be to first select which of the 13 values will be the 4-of-a-kind, which can be done in ways. What about the other card in the hand? Well, there are 48 other cards it could be, so the number of 4-of-a-kind hands is .
This makes the probability of getting 4-of-a-kind,
An important subtlety: Whenever counting the size of the sample space and the event, we must make sure that we are really counting the number of elements of the sample space that are in the event. In particular, if we count subsets of cards in the sample space (using a combination instead of using a permutation to count sequences of cards) then we must count the number of subsets of cards in the event.
Interestingly, we can find the probability of getting 4-of-a-kind using permutations too: The number of 5-card sequences is . Finding the number of 4-of-a-kind sequences is a little more complicated. There are 13 possible values for the 4-of-a-kind, and 48 remaining cards for the fifth card. But those five cards can be arranged in different ways. So the number of 4-of-a-kind sequences is . This gives,
Is this close to the same answer we had before? It is exactly the same (we can verify this by noticing the extra in both the numerator and denominator).
While picking between combinations and permutations (as long as you pick the same for both the sample space and the event) will give you the same probability, this is not always true, as you are asked to explore in some of the additional exercises.
Here are a few basic probability facts that follow easily from our definition of probability and understanding of counting. While we are still under the assumption that the outcomes in the sample space are equally likely (the uniform probability distribution), these rules will hold for all probability distributions.
First, we often are interested in the probability that an event does not occur. We call this the complement of the event. Remember, events are subsets of the sample space, and not being βinβ the event means you are in the complement of that subset. Using the same notation we have for sets, the complement of an event will be written . Here is the relationship between the probability of an event and its complement.
Remember that , the number of outcomes in the event divided by the total number of outcomes. But how many outcomes are not in ? All the others. That is,
There are lots of ways you can get at least one head, but only one way you can get no heads (that is, get all tails). So it makes sense to try to compute the requested probability as the complement of a probability easier to compute.
The sample space here is the set of all 10-toss sequences. How many are there? For each term in the sequence, it could be a head (H) or tail (T), so there are possible sequences.
We want to find the probability of getting at least one head. Letβs think of this just as a counting question: How many 10-toss sequences have at least one head? All of them, except the one all tails sequence. So there are sequences with at least one head. Thus the probability of getting at least one head is .
Wait, did we use Theorem 3.7.6? Not explicitly, but essentially we have. Using the theorem, we would have said that the probability of getting at least one head is
all tails.
Note that the calculation we did required subtracting fractions:
.
So whether we do the subtraction to calculate the size of the complement, or use the complement formula and subtract fractions, we get the same answer.
The complementary event is rolling a die four times and never getting a 6. Of the possible rolls, there are that contain no 6. So the probability of getting at least one 6 in four rolls is
at least one 6no 6.
For the double 6 in 24 rolls variant, we use the complementary event as well: what is the probability of not getting double 6s? That means on every roll you get one of the 35 other pairs.
at least one double 6no double 6.
Indeed, the Chevalier de Mere noticed that when playing the game with two dice, he tended to lose money in the long run. Who did he turn to to ask for help? Blaise Pascal, of course!
A probability of 1 means the event is certain, so perhaps we should think of this as giving the probability that event either happened or didnβt happen. This is exactly what we want to mean by adding probabilities.
We donβt need a theorem to answer this. The sample space is and the event is the subset . So .
To see where the comes from, let be the event of rolling an even number (so ) and be the event of rolling a number less than 3 (so ). Notice that the notation makes sense, since as sets, we really do have .
If we go back to the definition of the probability of an event, we have,
.
We must find the size of the set . But we know how to find the size of the union of non-disjoint sets: use PIE! So .
As the example demonstrates, we have basically translated the sum principle into the language of probability. Can we do the same for the product principle?
We use the product principle to find the number of ways two events can both happen, one after the other. Many probability questions ask for the probability of such compound events. Letβs consider an example to see what is going on.
First we will find the probability directly from the definition. The sample space consists of all pairs of outcomes from the die and the coin, so . Without listing these, we could have calculated the size of the sample space using the product principle: . The event we are interested in is the set of outcomes . Obviously that is size , which we could have also found as . So the probability of this event is .
Now consider the two events separately. Say is rolling an even number, and flipping the coin and getting heads. The probability of the first event is . The probability of the second event is . It appears that the correct way to combine these probabilities is to multiply them:
and .
How convenient that multiplying fractions is done by multiplying the numerators and denominators separately, and this is the same as applying the product principle to the numerator and denominator of the fraction.
The reason the above example worked out was that the events were independent. Intuitively, this means that the outcome of the first event has no influence on the outcome of the second event. Actually, we use this principle like the product principle to define independence.
Given two events and , we say that they are independent provided the probability of both events happening is the product of the probabilities of each event happening:
Notice that in the definition we describe the event that both and happen as the intersection . Since events are sets, it makes sense to take an intersection. The intersection of two sets contains all the elements that are in both sets, which is exactly what we want here.
This shines a light on a key difference between this definition and the product principle. We use the product principle to construct a new set of outcomes by combining the outcomes in two sets. This creates new sorts of outcomes. For example, the product principle would combine the sets
This is not the intersection of two sets (it is actually the Cartesian product: ). The definition of independence involves probabilities relative to a fixed set of outcomes. So the elements in and in the definition of independence are already sequences like we would have created using the product principle.
If we are more careful in Example 3.7.11 where we rolled a die and flipped a coin, we should describe the event of first rolling an even number as the set
The sample space is the set . The event is the set , is the set , and is the set . Thus the probabilities for each of these events are
.
To decide whether events and are independent, we find . The intersection of events and (meaning the number rolled is both a multiple of 3 and 4) is (the only element of the intersection is 12). We compare this to . Since these are equal, the events are independent.
Events and are not independent though. Since , we have . But . Since these are not equal, the events are not independent. This makes sense since there are fewer multiples of 4 less than 7 than not.
When events are not independent, we get a new interesting question we can ask: What is the probability of one event given that another event has occurred? This is called ...
The famous probability problem, known as the Monty Hall problem, presents the following conundrum. You are on the game show Letβs Make a Deal and will win whatever is behind one of three doors you decide to open. Behind one door is a car; behind the other two are goats. You pick a door, but before opening it, the host (Monty Hall) reveals one of the other doors that has a goat behind it. You then have the opportunity to switch doors. Should you switch? What is the probability of getting the car if you do?
You might be tempted to say that the probability of getting the car when you switch is . After all, there are two doors left, and the car is behind one of them. However, we must ask what the probability of getting the car is given that Monty has revealed a goat behind another, unpicked door.
Does this definition agree with our intuition for what conditional probability should mean? Letβs think about the sample space. We want to know the chances of occurring under the assumption that has already occurred. In other words, we only care about the elements of the sample space that belong to .
If becomes the sample space, then the only outcomes from that can possibly occur are the outcomes that are in and. So perhaps the definition of conditional probability really should be,
Suppose you roll two 6-sided dice with your eyes closed. Your friend says, βHey look, at least one of your dice is a 4.β What is the probability that you rolled a sum of 7?
First note that the probability of rolling a sum of 7 is , since of the 36 pairs of numbers that can appear, there are six pairs that sum to 7: . However, we are now in a situation where at least one die is a 4. This limits the sample space to the 11 pairs that contain a 4 (we can count this using PIE as ). Of these, only two sum to 7: . So the probability of rolling a sum of 7 given that one die is a 4 is .
If we use the definition of conditional probability, we would compute this slightly differently but arrive at the same answer. We have events (the sum is 7) and (at least one die is a 4). Then and . So .
Notice that the probability of rolling a sum of 7 given that the red die is a 4 (say they are different colors) will be different! That would be , since we are really just asking for the probability that the other die is a 3.
Suppose you draw two cards from a standard deck of 52 cards. What is the probability that the second card is a face card given that the first card is red?
We have a sample space consisting of the sequences of two cards. Event will be those pairs that have a red card as the first in the sequence. Event will be those pairs that have a face card second in the sequence.
We are looking for both and , so we will need to find ,, and . There are pairs in (select one of the 26 red cards, and then any of the remaining 51 cards), so
.
There are pairs in (select one of the 12 face cards, and then any of the remaining 51 cards), so
.
Finding the size of the intersection is a little more challenging (we did so in the subsection Combining Principles). There are pairs that start with a red, non-face card, and end with a face card, and another pairs that start with a red face card and end with a face card. So
.
So the probability that the second card is a face card given that the first card is red is,
.
The probability that the first card is red given that the second card is a face card is,
.
Wait a second! What? The probability that the first card is red given that the second card is a face card is the same as the probability that the first card is red?? It seems that , and that . What could that mean?
This looks almost like the definition of events being independent, except that instead of in the product we now have . But what does even mean if and are independent? If the events are independent, then it should be no more or less likely that occurs given that has occurred. So we should have . This is exactly what we saw in the last example.
What probability do you get for this same event if you use the fact that the probability of an event is 1 minus the probability of the complement of the event?
What is the probability of getting exactly 3 heads OR getting exactly 15 heads? Compute this by finding the size of the event, divided by the size of the sample space.
Suppose you have three dice: a 4-sided die, a 6-sided die, and an 8-sided die. Each die is fair and numbered from 1 to the number of sides it has. You roll all three dice.
Is it possible for the dice to show neither all the same nor all different numbers? If so, the probability of this happening would be 0. What is the probability?
When playing 5-card poker, a full house is a hand that contains three cards of one rank and two cards of another rank. For example, you could have three 7s and two 4s.
Assume the numbers come out of the generator in a sequence, so that the sample space has size .
Assume the numbers come out as a multiset, or equivalently, that the numbers must appear in non-decreasing order. You will want to use sticks and stones to count the size of the sample space.
Are the two answers the same? Why does this make sense?
Each of 10 friends has a deck of cards that they shuffle thoroughly. Each friend draws a card from their deck. What is the probability that at least one pair of friends draw a matching card?
How many people do you need to have in a room to have a 50% chance that at least two people share the same birthday (day of the year)? Assume that all birthdays are equally likely, and that nobody is born on Leap Day (February 29th).
At your 20th high school reunion, you meet an old friend you hadnβt heard from in years. You talk about pets, specifically cats and dogs. She tells you that she has two pets, and that at least one of them is a cat. What is the probability that she has two cats? (Assume that having a cat or a dog is equally likely.)
Another old friend overhears your pet conversation and says that he also has two pets, and that the one he has had the longest is a cat. What is the probability that he has two cats? And why is this answer different from the previous question?
You are playing a shell game with three cups. Under one cup are two green balls, under another cup are two red balls, and under the third cup are one green and one red ball. You close your eyes, and your friend rearranges the cups. You then open your eyes and pick a cup at random. You see that it contains a green ball. What is the probability that the other ball under that cup is also green? Explain your answer in terms of conditional probability.