Pages

Friday, February 21, 2014

Bernoulli dispersions

The Bernoulli random variable is named after the 17th century Swiss mathematician, and it describes the basic probability for binary discrete outcomes.  For example, we can have a unit Bernoulli random variable X that can take on a value of 1 with probability p, and a value of 0 with probability q.  Of course q=100%-p, since the Bernoulli can only take on one of these two outcomes.

We can model events such as the result of a health experiment, where the result can either be success, or failure.  Or the result of a coin toss, where the outcome can either be heads, or tails.  This note solves for, and illustrates, the inter-relation of one's outcome estimate and the error on that estimate.

The expected value, or the average, of this Bernoulli experiment is (p*1+q*0).  Which equals p.  And the respective expected squares, or the average of the squares, is (p*1^2+q*0^2).  This also equals p.

To calculate the variance we can go through either of these approaches.  The first principles approach goes as follows [where the function E() represents average]:

E[(X - average)^2]
= p*(1-p)^2 + q*(0-p)^2
p*(1-p)^2 + (1-p)*p^2
p*(1-p)*[(1-p) + p]
p*(1-p), or pq

Or we can solve for the variance from the moments principles using the averages we solved for in the earlier paragraphs:

E(X^2) - [E(X)]^2
= E(X^2) - average^2
= p - [p]^2
= p*(1-p), or pq

Now we can visualize how this dispersion looks for various eligible values for p (which is between 0 and 1).  We also stay with the normal unit of 1, instead of percents, which will make comparing to standard deviation units easier.  For example, we do not need to get tied into the technicality of taking the square root of a percent.


We notice that the variance (in blue) peaks at a p value of 0.5, where variance equals 0.5*(1-0.5).  Or 0.25.  The standard deviation (in green) is the square root of the variance, which itself peaks at a p value of 0.5.  And the standard deviation is higher, at (0.25)^(1/2).  Or 0.5.

In other words, in the center of the possible p values ranging from 0, to 1, the dispersion for the value is 0.5!  That's quite large, though it drops at a faster rate as the p moves closer to the ends of 0, or 1.  For example, at a p of 1/12, the variance is (1/12)*(11/12).  This comes to 11/144, or the 0.08 shown in the illustration above.  Still, even as the theory of sampling distributions have the typical dispersion calculation underbiases the actual dispersion, in this case we see that it it clearly overbiasesed.  One way to see this is to imagine how quickly the dispersion falls outside of the possible p or q distribution range of 0, to 1.

Now let's focus on the standard deviation shown for p, alongside q (or 1-p) as well.


The purpose of this note is to step beyond the common formulas, and broadly appreciate the sizing of the errors and its relation to the location of the inter-linked p and q estimates.  When running an economic or health experiment with binary outcomes, for example, what does the the error size look like for extreme outcome estimates near either 0%, or 100%?

So we note that since q=1-p, as the error to p is to the upside, then the error to q is to the downside by an equal amount.  Hence we scaled the diameter and labelled the circles about the p and q values, to the value of the standard deviation at these interlinked estimates.  We'll also note that for extreme p and (1-p) estimates, one of those two is near 0 and provides a more narrow standard error.  This reinforces the message that the dispersion, and the estimate of p, are inter-related for probability modeling with portion estimates.  Nonetheless, we see that the symmetrical nature of these standard deviation calculation overbiases the actual dispersion (note the circles disappearing when outside of the 0, to 1 range).

Now a more advanced topic that we may touch on later is that the by combining a number n of these Bernoulli experiments together, say a binomial distribution, we can shrink the standard deviation circles proportional to the square root of 1/n.  In probability theory, this is known as the central limit theorem.  So for a p estimate of 0.5 from a sample of 10, the standard deviation estimate drops from 0.5, to 0.5*(1/15)^(1/2).  Or 0.1.  And for very large n, this binomial output can be used to approximate many other constrained distributions.  The more popular ones of which include the Gaussian (normal) in continuous space, or the hypergeometric and Poisson in discrete space.

On an aside, since the past week conducted interviews on different policy topics for The Wall Street Journal, and Pension & Investments.  Additionally, today had investment research cited through a director at CFA Institute.

No comments:

Post a Comment