Monday, March 31, 2014

The birthday problem/paradox

The birthday problem (or birthday paradox) refers to the probability that, in a group of N people, there is at least one pair of people having the same birthday.
Surprisingly, you need just 23 people to have more than 50% chances to find at least a pair!
Yes, only 23! Not, as you could think, 183!

There are a short and a long versions of this post.
The short one (aka what you have to remember to impress your friends) is:

In a room with 23 people there are more than 50% chances (50.73% to be precise) to have a at least a couple sharing their birthday. 

With 50 people, the probability is 97.04 % and with 80 people we arrive to 99.99% changes to find such  a couple!


How is that possible? The rest of the post (the long version) proves the previous numbers.

First of all, we have to make some basic assumptions:

We have no twins in the room. 
No one is born on February, 29th! (we do not consider leap years)
Every day is as good as all the others.
 
In probability, the thing "to have at least a pair with the same birthday" is called an event, and its probability is indicated as P(A) where A is the name I chose for this event.

For practical reasons, it's easier to calculate the opposite event (probability of having NO pair with the same birthday), that I name event B.

To make everything, let's calculate the chances that a certain number of people WILL NOT share a birthday. 

If we have someone in a room and we add another person, this extra person will not share the same birthday as the first one with a probability: 

P(the second person will not share his/her birthday) = 364/365=99.73%

With just two persons, we have 99.72% chances that they have different birthdays.
The above formula is easy to understand if you think to a black bag containing red and white balls. If you have, e.g., 3 red balls and 5 white balls, then the probability to pickup (without seeing!) a red ball is

P(pickup a red ball) = 3/8 = 37.5%

3 is the number of favourable cases (the number of red balls) and 8 is the total number of possible cases (i.e. the total number of balls).  With the two persons it pretty much the same thing.
365 are the number of days in a year, but when you have already someone in the room, the second one will have only 365-1=364 days, different from the first one.
Imagine now a third person. The probability that he/she will not share his/her birthday with the other two will be:

P(the third person will not share the birthday with the other two) = 363/365 =   99.45%

At this point, shouldn't be a surprise for you if I tell you that:
The probability that one person will not share a birthday when NO ONE else is present is:

P(the first (and only one!) person will not share his/her birthday) = 365/365 = 100%.

Being the only one, will give us the certainty  that this person will not share his/her birthday!
It should be also clear, that if we have already n-1 persons and we add another one (the n-th one!), the probability that the new entry will not share his/her birthday will be:

P(the n-th person will not share his/her birthday) = (365 - (n-1) )/ 365

If you put n=1 or 2 or 3 you have the previous cases.


Now you need another element to calculate that N persons will not share their birthdays (this is another EVENT!)

All the probabilities we have calculated are referred to independent events. This means each new person you put in the room, will have no relations whatsoever with the other ones (we have no twins remember?) This is important because it allows us to multiply the previous probabilities to get "the combined event". I will try to clarify this point with a couple of examples:

Situation: two persons.
P(that these two persons will NOT share their birthdays) = P(that first person will not share his/her birthday) * P( the second  person will not share his/her birthday) = 1.0*0.9972= 99.72%


Situation: three persons.
P(that these three persons will NOT share their birthdays) = P(that first person will not share his/her birthday) * P( the second  person will not share his/her birthday) *  P( the third person will not share his/her birthday) = 1.0*0.9973*0.9945= 99.18%

Using the previous expression valid for N persons will have:

P(no shared birthday among N persons) = 365 x 364 x 363 x ... x(365 - N+1) / 365^N

If you know what a factorials and binomial coefficients are, the previous expression can be simplified to:



Are we done? Not yet! Just another tiny thing!

What we have calculated so far is the probability for the event called "no shared birthday among N persons"  is this what I wanted at the beginning of the post? Not exactly!
We want the probability the among N persons there is at least one shared birthday. This event is not that difficult to calculate since it is the opposite event of what we have with the previous formula.

Going back to the example of the bag full of balls, the opposite event of "picking up a red ball" is "not picking up a red ball" i.e. picking up a white ball!

When we have already the probability for a certain event A (and we do!), to calculate the probability of the opposite event  B we have the formula:


P(B) = 1- P(A)

That's it!

So, for the event "picking up a white ball" we have:

P(picking up a white ball)= 1 - P(picking up a red ball) = 1 -3/8 = 5/8 = 62.5 %

and for the event "at least a shared birthday among N persons" 

 
P(at least a shared birthday among N persons)=1 - P(no shared birthday among N persons) =1 - 365 x 364 x 363 x ... x(365 - N+1) / 365^N

or using the compact formula



If you use  this formula you can get back the number at the beginning of the post.
Try it with excel or openoffice/libreoffice and you'll get these values (first column N,  second column = probability )!


1 0
2 0.002739726
3 0.008204166
4 0.016355912
5 0.027135574
6 0.040462484
7 0.056235703
8 0.074335292
9 0.094623834
10 0.116948178
11 0.141141378
12 0.167024789
13 0.194410275
14 0.223102512
15 0.25290132
16 0.283604005
17 0.315007665
18 0.346911418
19 0.379118526
20 0.411438384
21 0.443688335
22 0.475695308
23 0.507297234
24 0.538344258
25 0.568699704
26 0.59824082
27 0.626859282
28 0.654461472
29 0.680968537
30 0.706316243
31 0.730454634
32 0.753347528
33 0.774971854
34 0.795316865
35 0.814383239
36 0.832182106
37 0.848734008
38 0.864067821
39 0.878219664
40 0.89123181
41 0.903151611
42 0.914030472
43 0.923922856
44 0.932885369
45 0.940975899
46 0.948252843
47 0.954774403
48 0.960597973
49 0.965779609
50 0.97037358
51 0.974431993
52 0.978004509
53 0.981138113
54 0.983876963
55 0.986262289
56 0.988332355
57 0.990122459
58 0.991664979
59 0.992989448
60 0.994122661
61 0.995088799
62 0.995909575
63 0.996604387
64 0.997190479
65 0.997683107
66 0.998095705
67 0.998440043
68 0.998726391
69 0.998963666
70 0.999159576
71 0.999320753
72 0.999452881
73 0.999560806
74 0.999648644
75 0.999719878
76 0.999777437
77 0.999823779
78 0.999860955
79 0.999890668
80 0.999914332
81 0.999933109

No comments: