Pages

Tuesday, October 15, 2013

Age one

This month the blog turns age one.  It's great to see a larger than normal number of cumulative page views (50,000).  And many hundreds of followers.  This has been possible because of the generous endorsements from a number of trend setters with worldwide influence.

The purpose of this blog has been to show statistics through a variety of real world settings.  The goal is not only to create the type of research that people want to read, but also to create the type of research that people actually discuss with their friends.  There are no distracting commercials on here.  So if you have had as much fun with this blog as I have had in putting it together, then please let me know your thoughts and also share this knowledge gift with your friends (subscription is a snap here).

The past year saw much blog research on topics concerning financial economics.  We also did have occasional posts on topics concerning theoretical math, law, and politics.  But in order to expand the applications of statistics methods, and in keeping with our overall mission, over the next year we will have a slight shift towards topics related to science, medicine, and actuarial mathematics.

Thanks much for the interest, and hope to have even more excitement during the year ahead.

Salil


P.S.  You didn't think we'd just send this nice message, without any fun probability problem in it, did you?  So we won't disappoint.

We'll start with a variation of a birthday probability problem, which a 20th century statistician Frederick Mosteller reintroduced.  Our special problem here is: how many random blogs would we need to sample, in order to have a 50% chance that at least one of them also shares an anniversary today (it need not be the first anniversary, as is the case with this blog)? 

So instead of any two people having the same anniversary, this problem is specifically seeking the probability model for another person having the same anniversary as you.

It is tempting to quickly guess 183, or half of the 365 number of days in a year.  What's wrong with this answer is it incorrectly assumes sampling without replacement.  If one generally samples in these calendar problems without replacement, then the 183 sampled blogs would have different anniversary dates.  And 50% of the calendar would therefore be sampled by the time we get to sample 183 (we ignore leap year technical complications).

In reality however, we can can randomly sample two other blogs, both of which share the same anniversary date that is not today.  When sampling with replacement, the probability of any one sampled blog not having an anniversary today is q=364/365.  Then the probability that none of n sampled blogs having today as their anniversary is q^n.

Or 100%-q^n is the probability that at least one sampled blog will have today as its anniversary.  Let's lay out the formula below.

50% = 100% - q^n
q^n  = 50%
n      = log base q of 50%
         = (ln 50%) / (ln q)

Don't grab for the calculator, a tool which didn't exist back when these probability types were first introduced.  Even Frederick Mosteller would solve these problems resorting to largely complex logarithmic data, which were only tabulated on paper at Princeton.  This approach too was unneeded.  Since we started this problem in our heads by guessing 183, we'll also finish this problem in our heads.  First we estimate q as roughly equal to e^-(1%/3.65).  Second if 200% is e^69% (e.g., continuous compounding 1% interest for doubling principal), then 50% is e^(-69%).

These are the sorts of short cuts that can help solve problems more logically.  Our result is an n of about 69%/(1%/3.65).  Since 3.65 is ~11/3, and 69 can be factored by 3, we are set.  We get the integer result of 69/(3/11)=23*11.  Or 253 is the number of random blogs that we would need to sample, solving for a 50% chance that at least one of them shares today as its anniversary date.

Because we have sampling with replacement, 253 is more than a third greater than the quick (and incorrect) guess of 183.  This 30% spread, between the initial guess without replacement and the more accurate guess with replacement, would converge towards 0 as the number of events (e.g., 365 dates) increases and the probability being sought (e.g., 50%) decreases.

No comments:

Post a Comment