## Monday, March 31, 2014

### The birthday problem/paradox

The birthday problem (or birthday paradox) refers to the probability that, in a group of N people, there is at least one pair of people having the same birthday.
Surprisingly, you need just 23 people to have more than 50% chances to find at least a pair!
Yes, only 23! Not, as you could think, 183!

There are a short and a long versions of this post.
The short one (aka what you have to remember to impress your friends) is:

In a room with 23 people there are more than 50% chances (50.73% to be precise) to have a at least a couple sharing their birthday.

With 50 people, the probability is 97.04 % and with 80 people we arrive to 99.99% changes to find such  a couple!

How is that possible? The rest of the post (the long version) proves the previous numbers.

First of all, we have to make some basic assumptions:

We have no twins in the room.
No one is born on February, 29th! (we do not consider leap years)
Every day is as good as all the others.

In probability, the thing "to have at least a pair with the same birthday" is called an event, and its probability is indicated as P(A) where A is the name I chose for this event.

For practical reasons, it's easier to calculate the opposite event (probability of having NO pair with the same birthday), that I name event B.

To make everything, let's calculate the chances that a certain number of people WILL NOT share a birthday.

If we have someone in a room and we add another person, this extra person will not share the same birthday as the first one with a probability:

P(the second person will not share his/her birthday) = 364/365=99.73%

With just two persons, we have 99.72% chances that they have different birthdays.
The above formula is easy to understand if you think to a black bag containing red and white balls. If you have, e.g., 3 red balls and 5 white balls, then the probability to pickup (without seeing!) a red ball is

P(pickup a red ball) = 3/8 = 37.5%

3 is the number of favourable cases (the number of red balls) and 8 is the total number of possible cases (i.e. the total number of balls).  With the two persons it pretty much the same thing.
365 are the number of days in a year, but when you have already someone in the room, the second one will have only 365-1=364 days, different from the first one.
Imagine now a third person. The probability that he/she will not share his/her birthday with the other two will be:

P(the third person will not share the birthday with the other two) = 363/365 =   99.45%

At this point, shouldn't be a surprise for you if I tell you that:
The probability that one person will not share a birthday when NO ONE else is present is:

P(the first (and only one!) person will not share his/her birthday) = 365/365 = 100%.

Being the only one, will give us the certainty  that this person will not share his/her birthday!
It should be also clear, that if we have already n-1 persons and we add another one (the n-th one!), the probability that the new entry will not share his/her birthday will be:

P(the n-th person will not share his/her birthday) = (365 - (n-1) )/ 365

If you put n=1 or 2 or 3 you have the previous cases.

Now you need another element to calculate that N persons will not share their birthdays (this is another EVENT!)

All the probabilities we have calculated are referred to independent events. This means each new person you put in the room, will have no relations whatsoever with the other ones (we have no twins remember?) This is important because it allows us to multiply the previous probabilities to get "the combined event". I will try to clarify this point with a couple of examples:

Situation: two persons.
P(that these two persons will NOT share their birthdays) = P(that first person will not share his/her birthday) * P( the second  person will not share his/her birthday) = 1.0*0.9972= 99.72%

Situation: three persons.
P(that these three persons will NOT share their birthdays) = P(that first person will not share his/her birthday) * P( the second  person will not share his/her birthday) *  P( the third person will not share his/her birthday) = 1.0*0.9973*0.9945= 99.18%

Using the previous expression valid for N persons will have:

P(no shared birthday among N persons) = 365 x 364 x 363 x ... x(365 - N+1) / 365^N

If you know what a factorials and binomial coefficients are, the previous expression can be simplified to:

Are we done? Not yet! Just another tiny thing!

What we have calculated so far is the probability for the event called "no shared birthday among N persons"  is this what I wanted at the beginning of the post? Not exactly!
We want the probability the among N persons there is at least one shared birthday. This event is not that difficult to calculate since it is the opposite event of what we have with the previous formula.

Going back to the example of the bag full of balls, the opposite event of "picking up a red ball" is "not picking up a red ball" i.e. picking up a white ball!

When we have already the probability for a certain event A (and we do!), to calculate the probability of the opposite event  B we have the formula:

P(B) = 1- P(A)

That's it!

So, for the event "picking up a white ball" we have:

P(picking up a white ball)= 1 - P(picking up a red ball) = 1 -3/8 = 5/8 = 62.5 %

and for the event "at least a shared birthday among N persons"

P(at least a shared birthday among N persons)=1 - P(no shared birthday among N persons) =1 - 365 x 364 x 363 x ... x(365 - N+1) / 365^N

or using the compact formula

If you use  this formula you can get back the number at the beginning of the post.
Try it with excel or openoffice/libreoffice and you'll get these values (first column N,  second column = probability )!
 1 0 2 0.002739726 3 0.008204166 4 0.016355912 5 0.027135574 6 0.040462484 7 0.056235703 8 0.074335292 9 0.094623834 10 0.116948178 11 0.141141378 12 0.167024789 13 0.194410275 14 0.223102512 15 0.25290132 16 0.283604005 17 0.315007665 18 0.346911418 19 0.379118526 20 0.411438384 21 0.443688335 22 0.475695308 23 0.507297234 24 0.538344258 25 0.568699704 26 0.59824082 27 0.626859282 28 0.654461472 29 0.680968537 30 0.706316243 31 0.730454634 32 0.753347528 33 0.774971854 34 0.795316865 35 0.814383239 36 0.832182106 37 0.848734008 38 0.864067821 39 0.878219664 40 0.89123181 41 0.903151611 42 0.914030472 43 0.923922856 44 0.932885369 45 0.940975899 46 0.948252843 47 0.954774403 48 0.960597973 49 0.965779609 50 0.97037358 51 0.974431993 52 0.978004509 53 0.981138113 54 0.983876963 55 0.986262289 56 0.988332355 57 0.990122459 58 0.991664979 59 0.992989448 60 0.994122661 61 0.995088799 62 0.995909575 63 0.996604387 64 0.997190479 65 0.997683107 66 0.998095705 67 0.998440043 68 0.998726391 69 0.998963666 70 0.999159576 71 0.999320753 72 0.999452881 73 0.999560806 74 0.999648644 75 0.999719878 76 0.999777437 77 0.999823779 78 0.999860955 79 0.999890668 80 0.999914332 81 0.999933109