Pages

Tuesday, December 17, 2013

Looking at dispersions


Quick quiz: Which colored distribution, of the six shown below, has the greatest dispersion?  For that selected distribution, what is the approximate dispersion value (e.g., by how much does the data typically differ from its average)?


This is a question that tests a number of concepts concerning basic 2nd order descriptive statistics, which most people find difficult to correctly grasp.  And it is important to be able to think through these ideas fluidly as we often work with random distributions in our work, with the distribution here that could represent anything from differences in a nutrient’s intake level, to a change in asset returns.  The actual distributions shown are proxies for the distribution name they are given in the label, which will making it very easy at the end to visually measure each dispersion (without a calculator).  

Sure the more mound shaped distributions, those on the left in blue, have a wider range.  This is very visible as a quarter of the data, furthest from the zero center, is as far away as about +1.8.  But one answer to the questions above is that the standard deviation (σ) for all of these distributions are exactly 1!  Did you guess this answer, or instead did you guess values higher than 1?  And the variance, which is equal to σ2, is of course also 1.  For the Bernoulli distribution (with values that can take on only +1, or -1), shown on the rightmost column in red, a typical deviation and σ both equaling 1 makes perfect sense.

But what happens, as we start moving to the distributions on the left of this Bernoulli?  There we see distributions, such as the approximately uniform in green, which have large probability weights in-between +1, as well.  The mean absolute deviation (MAD) therefore would decrease progressively, as we continue to move left, from the 1 that saw in the case of the Bernoulli.  The MAD looks only at the typical dispersion from the center, without regard to its direction (e.g., positive, or negative).  For the approximately uniform as an example, with a range of +1.7, the MAD is about 0.9.  Which is of course less than 1.  And this MAD<1 is another point of reference when thinking about the core dispersion of a distribution, and not just focus exclusively on the larger range.  This lower MAD is also made possible here as we have a σ that is slightly larger than MAD, given the greater weight to those data in the widest part of the distribution.  The formula for convexity is:

E(X2)                              > E(X)2

Where E(…) represents the average of whatever is in the ().  And this equation’s difference, representing convexity, increases through the power function as X deviates greatly from zero.  Therefore variance, and as a result σ, is more greatly weighted the further a range deviates from zero.

The lesson of this note can be transformed to mound shaped distributions, with heavier central weight, tend to have smaller σ’s relative to their range and even smaller MADs.  While those with nearly no central weight, tend to have larger σ’s relative to their range.  For fun, we'll now show a thought process to arrive at the variance for each of the illustrated distributions, using different approaches:

Triangular/binomial        E(X2)-(EX)2 = ¼[1.42 + (-1.4) 2]-(02)                     = 1
Binomial/normal            npq scaled = 8(½)(1-½)(1.8/2.5)2                           = 1
Discrete uniform            (a-b)2/12 = [1.7 – (-1.7)]2/12                                  = 1
Beta w/ α,β ~5/6           Beta formula = (5/6)2/[(10/6)2(10/6+1)]*32          = 1
Beta w/ α,β ~4/6           weight scaled Bernoulli = 6/8*(1)*(2.3/2)2           = 1
Bernoulli                        average squared deviations = ½[12 + (-1)2]          = 1

Also some other worthy notes are that on this link is the online version of a featured year-ending article titled "Statistics for the next hundred years", jointly published by the prestigious American Statistical Association, and also the Royal Statistical Society.  And if you -or anyone you know- would ever appreciate a general review of probability and statistics topics,  then you can download this lecture here.

No comments:

Post a Comment