The birthday problem/paradox
The birthday problem (or birthday paradox) refers to the probability that, in a group of N people, there is at least one pair of people having the same birthday.
Surprisingly, you need just 23 people to have more than 50% chances to find at least a pair!
Yes, only 23! Not, as you could think, 183!
There are a short and a long versions of this post.
The short one (aka what you have to remember to impress your friends) is:
In a room with 23 people there are more than 50% chances (50.73% to be precise) to have a at least a couple sharing their birthday.
With 50 people, the probability is 97.04 % and with 80 people we arrive to 99.99% changes to find such a couple!
How is that possible? The rest of the post (the long version) proves the previous numbers.
First of all, we have to make some basic assumptions:
We have no twins in the room.
No one is born on February, 29th! (we do not consider leap years)
Every day is as good as all the others.
In probability, the thing "to have at least a pair with the same birthday" is called an event, and its probability is indicated as P(A) where A is the name I chose for this event.
For practical reasons, it's easier to calculate the opposite event (probability of having NO pair with the same birthday), that I name event B.
To make everything, let's calculate the chances that a certain number of people WILL NOT share a birthday.
If we have someone in a room and we add another person, this extra person will not share the same birthday as the first one with a probability:
P(the second person will not share his/her birthday) = 364/365=99.73%
With just two persons, we have 99.72% chances that they have different birthdays.
The above formula is easy to understand if you think to a black bag containing red and white balls. If you have, e.g., 3 red balls and 5 white balls, then the probability to pickup (without seeing!) a red ball is
P(pickup a red ball) = 3/8 = 37.5%
3 is the number of favourable cases (the number of red balls) and 8 is the total number of possible cases (i.e. the total number of balls). With the two persons it pretty much the same thing.
365 are the number of days in a year, but when you have already someone in the room, the second one will have only 365-1=364 days, different from the first one.
Imagine now a third person. The probability that he/she will not share his/her birthday with the other two will be:
P(the third person will not share the birthday with the other two) = 363/365 = 99.45%
At this point, shouldn't be a surprise for you if I tell you that:
The probability that one person will not share a birthday when NO ONE else is present is:
P(the first (and only one!) person will not share his/her birthday) = 365/365 = 100%.
Being the only one, will give us the certainty that this person will not share his/her birthday!
It should be also clear, that if we have already n-1 persons and we add another one (the n-th one!), the probability that the new entry will not share his/her birthday will be:
P(the n-th person will not share his/her birthday) = (365 - (n-1) )/ 365
If you put n=1 or 2 or 3 you have the previous cases.
Now you need another element to calculate that N persons will not share their birthdays (this is another EVENT!)
All the probabilities we have calculated are referred to independent events. This means each new person you put in the room, will have no relations whatsoever with the other ones (we have no twins remember?) This is important because it allows us to multiply the previous probabilities to get "the combined event". I will try to clarify this point with a couple of examples:
Situation: two persons.
P(that these two persons will NOT share their birthdays) = P(that first person will not share his/her birthday) * P( the second person will not share his/her birthday) = 1.0*0.9972= 99.72%
Situation: three persons.
P(that these three persons will NOT share their birthdays) = P(that first person will not share his/her birthday) * P( the second person will not share his/her birthday) * P( the third person will not share his/her birthday) = 1.0*0.9973*0.9945= 99.18%
Using the previous expression valid for N persons will have:
P(no shared birthday among N persons) = 365 x 364 x 363 x ... x(365 - N+1) / 365^N
If you know what a factorials and binomial coefficients are, the previous expression can be simplified to:
Are we done? Not yet! Just another tiny thing!
What we have calculated so far is the probability for the event called "no shared birthday among N persons" is this what I wanted at the beginning of the post? Not exactly!
We want the probability the among N persons there is at least one shared birthday. This event is not that difficult to calculate since it is the opposite event of what we have with the previous formula.
Going back to the example of the bag full of balls, the opposite event of "picking up a red ball" is "not picking up a red ball" i.e. picking up a white ball!
When we have already the probability for a certain event A (and we do!), to calculate the probability of the opposite event B we have the formula:
P(B) = 1- P(A)
That's it!
So, for the event "picking up a white ball" we have:
P(picking up a white ball)= 1 - P(picking up a red ball) = 1 -3/8 = 5/8 = 62.5 %
and for the event "at least a shared birthday among N persons"
P(at least a shared birthday among N persons)=1 - P(no shared birthday among N persons) =1 - 365 x 364 x 363 x ... x(365 - N+1) / 365^N
or using the compact formula
If you use this formula you can get back the number at the beginning of the post.
Try it with excel or openoffice/libreoffice and you'll get these values (first column N, second column = probability )!
Surprisingly, you need just 23 people to have more than 50% chances to find at least a pair!
Yes, only 23! Not, as you could think, 183!
There are a short and a long versions of this post.
The short one (aka what you have to remember to impress your friends) is:
In a room with 23 people there are more than 50% chances (50.73% to be precise) to have a at least a couple sharing their birthday.
With 50 people, the probability is 97.04 % and with 80 people we arrive to 99.99% changes to find such a couple!
How is that possible? The rest of the post (the long version) proves the previous numbers.
First of all, we have to make some basic assumptions:
We have no twins in the room.
No one is born on February, 29th! (we do not consider leap years)
Every day is as good as all the others.
In probability, the thing "to have at least a pair with the same birthday" is called an event, and its probability is indicated as P(A) where A is the name I chose for this event.
For practical reasons, it's easier to calculate the opposite event (probability of having NO pair with the same birthday), that I name event B.
To make everything, let's calculate the chances that a certain number of people WILL NOT share a birthday.
If we have someone in a room and we add another person, this extra person will not share the same birthday as the first one with a probability:
P(the second person will not share his/her birthday) = 364/365=99.73%
With just two persons, we have 99.72% chances that they have different birthdays.
The above formula is easy to understand if you think to a black bag containing red and white balls. If you have, e.g., 3 red balls and 5 white balls, then the probability to pickup (without seeing!) a red ball is
P(pickup a red ball) = 3/8 = 37.5%
3 is the number of favourable cases (the number of red balls) and 8 is the total number of possible cases (i.e. the total number of balls). With the two persons it pretty much the same thing.
365 are the number of days in a year, but when you have already someone in the room, the second one will have only 365-1=364 days, different from the first one.
Imagine now a third person. The probability that he/she will not share his/her birthday with the other two will be:
P(the third person will not share the birthday with the other two) = 363/365 = 99.45%
At this point, shouldn't be a surprise for you if I tell you that:
The probability that one person will not share a birthday when NO ONE else is present is:
P(the first (and only one!) person will not share his/her birthday) = 365/365 = 100%.
Being the only one, will give us the certainty that this person will not share his/her birthday!
It should be also clear, that if we have already n-1 persons and we add another one (the n-th one!), the probability that the new entry will not share his/her birthday will be:
P(the n-th person will not share his/her birthday) = (365 - (n-1) )/ 365
If you put n=1 or 2 or 3 you have the previous cases.
Now you need another element to calculate that N persons will not share their birthdays (this is another EVENT!)
All the probabilities we have calculated are referred to independent events. This means each new person you put in the room, will have no relations whatsoever with the other ones (we have no twins remember?) This is important because it allows us to multiply the previous probabilities to get "the combined event". I will try to clarify this point with a couple of examples:
Situation: two persons.
P(that these two persons will NOT share their birthdays) = P(that first person will not share his/her birthday) * P( the second person will not share his/her birthday) = 1.0*0.9972= 99.72%
Situation: three persons.
P(that these three persons will NOT share their birthdays) = P(that first person will not share his/her birthday) * P( the second person will not share his/her birthday) * P( the third person will not share his/her birthday) = 1.0*0.9973*0.9945= 99.18%
Using the previous expression valid for N persons will have:
P(no shared birthday among N persons) = 365 x 364 x 363 x ... x(365 - N+1) / 365^N
If you know what a factorials and binomial coefficients are, the previous expression can be simplified to:
Are we done? Not yet! Just another tiny thing!
What we have calculated so far is the probability for the event called "no shared birthday among N persons" is this what I wanted at the beginning of the post? Not exactly!
We want the probability the among N persons there is at least one shared birthday. This event is not that difficult to calculate since it is the opposite event of what we have with the previous formula.
Going back to the example of the bag full of balls, the opposite event of "picking up a red ball" is "not picking up a red ball" i.e. picking up a white ball!
When we have already the probability for a certain event A (and we do!), to calculate the probability of the opposite event B we have the formula:
P(B) = 1- P(A)
That's it!
So, for the event "picking up a white ball" we have:
P(picking up a white ball)= 1 - P(picking up a red ball) = 1 -3/8 = 5/8 = 62.5 %
and for the event "at least a shared birthday among N persons"
P(at least a shared birthday among N persons)=1 - P(no shared birthday among N persons) =1 - 365 x 364 x 363 x ... x(365 - N+1) / 365^N
or using the compact formula
If you use this formula you can get back the number at the beginning of the post.
Try it with excel or openoffice/libreoffice and you'll get these values (first column N, second column = probability )!
1 | 0 |
2 | 0.002739726 |
3 | 0.008204166 |
4 | 0.016355912 |
5 | 0.027135574 |
6 | 0.040462484 |
7 | 0.056235703 |
8 | 0.074335292 |
9 | 0.094623834 |
10 | 0.116948178 |
11 | 0.141141378 |
12 | 0.167024789 |
13 | 0.194410275 |
14 | 0.223102512 |
15 | 0.25290132 |
16 | 0.283604005 |
17 | 0.315007665 |
18 | 0.346911418 |
19 | 0.379118526 |
20 | 0.411438384 |
21 | 0.443688335 |
22 | 0.475695308 |
23 | 0.507297234 |
24 | 0.538344258 |
25 | 0.568699704 |
26 | 0.59824082 |
27 | 0.626859282 |
28 | 0.654461472 |
29 | 0.680968537 |
30 | 0.706316243 |
31 | 0.730454634 |
32 | 0.753347528 |
33 | 0.774971854 |
34 | 0.795316865 |
35 | 0.814383239 |
36 | 0.832182106 |
37 | 0.848734008 |
38 | 0.864067821 |
39 | 0.878219664 |
40 | 0.89123181 |
41 | 0.903151611 |
42 | 0.914030472 |
43 | 0.923922856 |
44 | 0.932885369 |
45 | 0.940975899 |
46 | 0.948252843 |
47 | 0.954774403 |
48 | 0.960597973 |
49 | 0.965779609 |
50 | 0.97037358 |
51 | 0.974431993 |
52 | 0.978004509 |
53 | 0.981138113 |
54 | 0.983876963 |
55 | 0.986262289 |
56 | 0.988332355 |
57 | 0.990122459 |
58 | 0.991664979 |
59 | 0.992989448 |
60 | 0.994122661 |
61 | 0.995088799 |
62 | 0.995909575 |
63 | 0.996604387 |
64 | 0.997190479 |
65 | 0.997683107 |
66 | 0.998095705 |
67 | 0.998440043 |
68 | 0.998726391 |
69 | 0.998963666 |
70 | 0.999159576 |
71 | 0.999320753 |
72 | 0.999452881 |
73 | 0.999560806 |
74 | 0.999648644 |
75 | 0.999719878 |
76 | 0.999777437 |
77 | 0.999823779 |
78 | 0.999860955 |
79 | 0.999890668 |
80 | 0.999914332 |
81 | 0.999933109 |
Comments