Pages

Thursday, January 16, 2014

√n law


How do we estimate the confidence range about an estimate?  In the body of probability theory, we take a distribution and directly measure the dispersion from the empirical data or theoretical moment generating function.  This dispersion is not influenced by the central location of the distribution, though we see in the link above that the shape, and of course scale, of the distribution do influence dispersion.  Also, we generally think of the typical dispersion sometimes as MAD (mean absolute deviation), while other times we prefer the generally larger σ.   Recall that the standard deviation places greater weights on the distribution components that are furthest from the distribution's center.

Now there are some peculiar cases where we can just apply a quick statistical √n law, in order to estimate the confidence range about an average estimate.  But it is often presented in discussions to give a false sense that we can often utilize this √n law in analyzing physical properties.  As an example, in his genetics and negentropy book “What is Life?”, the 20th century molecular biophysicist, Erwin Schrödinger, liberally states that this can apply “in any physical law” and then also asserts that any initial molecule count estimate could follow this √n law.  Is this true?  We’ll see in this theoretical note that there are, in fact, serious probability limits that apply and cause great uncertainty on the use of the √n law.  To start, let's lay out a few initial, broad examples that will stir thought:  

Example A: A total of 480 meteor fragments from a distant star are approaching Earth, uniformly over a period of several days.  As a result of the debris scatter and path, each time zone has an equal chance of being impacted by an equal amount of debris over the several days.  The average number of fragments per time zone is 20.  But is the typical deviation about this average estimate therefore √20, or ~4.5?

Example B: One consumes 2100 calories per day, through 3 fairly-equal sized meals.  The average per-meal caloric intake is therefore 700.  Is the typical deviation about this 700 estimate therefore √700, or ~26?

Example C: Each day in October, a gift shop sells one vase.  It has a 50/50 chance of being a small vase for $5, or a large vase for $7.  Is the typical deviation about the average October sale therefore √6, or ~$2.4?

By the end of this note we’ll be able to articulate which of the above three examples, if any, satisfy the √n law.  We’ll see that the important considerations, one must consider, boil down to these: 
the number of discrete options for the estimate and the size of n, both of which loosely connect to dimensions in the physical space provide guidance on whether the confidence range converges in limit towards the √n law, and
whether the set-up of the physical space is in standard Bernoulli units (e.g., 0,1) or something else, such as nominal, ordinal, or interval numbering conventions.

First let’s show the results from a quick example, in a low-dimensional space.  Here we only explore the amount of per-meal caloric intake, from just two daily meals.  We’ll define the dimensions in a short bit.  In one case we assume that for each meal there is a 50% chance of consuming 700 calories, a 25% chance of consuming 1400 calories, and a 25% chance of not consuming any calories.  In this case, with a binomial, the √n would not work for the per-meal estimate.  We’ll describe later how, instead, the conditions where a √(n/2) law would work for this case.

Now in a second case, let’s say that our 700 per-meal caloric intake results from a 1/3 chance of consuming 700 calories, a 1/3 chance of consuming 1400 calories, and a 1/3 chance of not consuming any calories.  In this case, the √n would also not work for the per-meal estimate.  Though it could work as an approximate for a total-meals estimate, assuming we do not assume a total estimate and then dependence between meals.  In this latter case, the dispersion does not follow the √(n/2) law, but is just of course 0.

So why is there a difference between the two types of cases above?  By spreading the distribution of events among a discrete uniform set of options, instead of a binomial set of options, we in effect work towards the convolution needed to offer the wider dispersion over the options [√n versus √(n/2)].  We find that we need to have at least several discrete uniform options to make the convolution possible, though not so many that it is diluted.  At the same time, the size of n needs to be large enough as well for the convolution to work, but not so high that the number of options is again too small in relation to n.  Hence the number of discrete options and n necessarily go hand-in-hand, in connecting to the number of dimensions.

Now here we will work through a specific medium-dimensional example only to see the convolution mechanics a little more closely.  This mathematical build-up is somewhat complex so feel free to skip the formulae and focus instead on the written concepts of this note.  And again, the example is still not of a high enough number of dimensions to work.  But through the example we'll also discuss how to expand the results toward a large-dimensional framework.  So let's dive into the initial Example B, where a person consumes 2100 calories over 3 daily meals.

Case 1. Binomial, m=2, p=q=1/2 
Meal 1:            700 * {0, 1, 1, 2}
Meal 2:            700 * {0, 1, 1, 2}
Meal 3:            700 * {0, 1, 1, 2}
Sum estimate  = 700 * 3 * average of {0, 1, 1, 2}
                      = 2100
Variance          = (700)^2 * (3*m)*p*q
                      = 700^2 * (2*n)/4
Dispersion       = 700 * √(n/2)

The 700 working their way through the formulae is simply a scaling factor, and not part of the estimate of n.  So this example would not work on a caloric-estimate basis, but one could try to continue our with our analysis with a total-meals basis in mind.  Also while the sum of n therefore averages to 3, which is 3 times the typical per-meal average, the dispersion is a factor of whatever the scale is.  So similar to the 700, we can solve the problem at the total-meals only level, with equal results as the per-meal level.  So the √n law is still violated, though we see the result instead attempts to follow √(n/2).

Case 2. Convolution
Meal 1:          700 * {0, 1, 2}
Meal 2:           700 * {0, 1, 2}
Meal 3:          700 * {0, 1, 2}
Distribution: 700* 0 → 1 way (0,0,0)
                      700* 1 → 3 ways (0,0,1 or 0,1,0 or 1,0,0)
                      700* 2 → 6 ways (0,1,1 or 1,0,1 or 0,1,1 or 0,0,2 or 0,2,0 or 2,0,0)
                         700* 3 → 7 ways remaining, including (1,1,1)
                      700* 4 → 6 ways (2,1,1 or 1,2,1 or 2,1,1 or 2,2,0 or 2,0,2 or 0,2,2)
                      700* 5 → 3 ways (2,2,1 or 2,1,2 or 1,2,2)
                      700* 6 →1 way (2,2,2)
Sum estimate = 700*3
                         = 2100
Square Dist.: 700* 0 → 1 way
                      700* 1 → 3 ways
                      700* 4 → 6 ways
                         700* 9 → 7 ways
                      700* 16 → 6 ways
                      700* 25 → 3 ways
                      700* 36 →1 way
Typical sum^2 = 700 (0*1+1*3+4*6+9*7+16*6+25*3+36*1) / (count of all ways)
                        = 700 * (297) / 27
                        = 700 * 11
Variance         = 700^2 * 11 – 700^2 + 3^2
                      = 700^2 * (2)
Dispersion = 700 * √2
                      < 700 * √n
We put aside the 700 for the same scaling rationale as given in Case 1, and thus immediately shift away from this being a caloric-estimate level.  At a total-meals level though, we then see the result of √2~1.4, which is under √(n=3)~1.7 versus the less conservative √(n/2=3/2)~1.2.  We'll discuss in a short bit that we can not take these dispersion values as strict, however.  This is perilous given that the actual dispersion can always be greater or less then what is estimated from either √n or √(n/2).  This is why having the correct number of discrete uniform options and a high number of n is important to narrow to the correct confidence range.  In this case we can settle that in the discrete uniform convolution approach, we show that this case better fits the √n law but still not at the caloric-estimate level.

We see in the illustration below that for very small n, both the √n or √(n/2) estimates for the confidence range, overly dominate the n size at the left side of the chart sometimes resulting in the √n law's impossibly negative lower bound for the average estimate.  But as n increases, the relative difference between √n and √(n/2), which are always different by a significant scale of 1.414, quickly disappears relative to n.


Before we next explore how the convolution approach above works for a broader set of dimensions, and sampling number conventions, let’s explore why the dispersion is larger for the convolution idea [√n] versus the binomial approach [√(n/2)].

With the convolution approach only we are setting the overall physical space expectation upfront, and more importantly we are evenly distributing the data along the discrete partitions.  Therefore not only are there not a larger tendency for each sample to be near the center (e.g., the mound-shaped binomial), but the data presupposes that the sample distribution is not always independent of one another.  And therefore the variance of the final, catch-all partition can be large.

See this for the initial daily caloric sums at the low end of the total-meals caloric distribution: 0, 700, 1400.  Under Case 1, the probability associated for each caloric sum is: 6C0/2^6=2%6C1/2^6=9%6C2/2^6=23%.  And under Case 2, we borrow from the combinations solved above and show it is: 1/27=4%, 3/27=11%6/27=22%.  So we see this charted below as Case 2 has the larger outer probability, while Case 1 has the smaller outer probability.


In terms of combinations, there are a greater proportion of ways to get to 1400, or less, daily calories in Case 2 versus Case 1.  And this difference continues with a lager number dimensions, as we converge towards the wider √n confidence range.  Why is this?  It is because it gives equal weight of getting to 1400 calories through eating one meal that day with that meal being 1400 calories, and through eating two meals that day with each being 700 calories.  Relative to Case 2, the Case 1 formulae diminishes the weight of the option where one only consumes one meal and it is at 1400 calories.

One can see these additional three options in blue where only two 700-calorie meals are consumed for the entire day, which add to the three options in red of only one 1400-calorie meal is consumed for the entire day.  See chart below for the convolution of 1400 calories consumed in total, from the three meals, and one can envision a somewhat planar counter that unites the 6 possibilities.


This form of convolution is important when answering investment risk questions as well.  For example, say that one has a $100 investment at the start of the year.  And we know there is a 10% monthly chance that the investment would suffer one loss in that month.  And we know that 25% of all losses are at least $10 or more.  Does this imply that one would lose at least $10 only 3 times every decade (10 years*12 months*10% chance*25% of losses)?  Leaving aside the small frequency counts in this example, the answer is no.  And it's no because we did not mention what happens in the other 90% of monthly chances, or in the other 75% of the loss levels.  Say we suddenly know that there were also a 10% monthly chance that the investment would suffer two losses in a month, and 75% of all losses are at the $5 level.  Then we would lose at least $10 in an additional 12 additional months every decade  (10 years*12 months*10% chance*100% of losses).  So now 15 months every decade would see a loss of $10 or greater, or now at least once annually.

This is how the convolution probabilities set-up would look thusfar, for the risk example above:

Count of monthly losses = {1 at 10%, 2 at 10%, and unknown at 80%}
Loss severity distribution    = {$5 at 75%, and $10 at 25%} 

The one loss per month or two losses per month is a risk-loss analog to our illustrated Case 2 above.  In Case 2 we were instead considering the number of meals consumed, and the per-calorie level of those meals.

Also note that we took the three partitions and showed the resulting distribution in a three-dimensional chart.  This is key in physics and chemistry, with the “physical laws” for example mentioned by Erwin Schrödinger can generally apply to statistically estimating three-dimensional molecules in three-dimensional space.  Thus we must have at least that many, and generally much greater, number of discrete partitions.  Also nothing should preclude us from thinking about dispersion of estimates across many more dimensions however, in this particular area of probability theory where each partition contributing to n could be thought of as a dimension (as we have shown with the binomial case).  And as we suggested a few ways earlier, as the number of discrete options and n, both increase in concert, the closer we become to approach the √n law.

To wrap up the other considerations, we note that with the risk example and Example B (dealing with meal caloric estimates from convolution), the √n law not work on the actual estimate.  But it can approximately work on a number-of-partitions level.  

And note that in Example C, the gift shop retailer could not use the √n law.  In this example, the pricing units are $5 and $7, which average to an n of 6.  Though that would come with an inherent narrow scaling, which could be modified similar to Example B where we simply relocate the distribution options’ lower bound from $5, to $0.

So Example A is the only one of the three that can simply use the √n law to assess the estimate's confidence.  Of course we know now that a lesson from this is that even something that scientists propose as a universal physical law, but which actually concerns confidence estimates, are in fact layered unwittingly in ambiguity.  Scientists and risk managers would be best not labour for critical governing laws, only to then take as given probability short-cuts.  It is better instead to think through -in every application- the restrictions, alternate formulas, and the concept of the confidence in one's confidence interval.  The latter topic we'll explore in greater detail, on a future note.

4 comments:

  1. how does this article apply to the small cap stock risk?

    ReplyDelete
    Replies
    1. Thanks much Anonymous. This note is concerned with probability uncertainty on average estimations. It has some grounding in convolution risk, though if risk is your primary focus, then there is better theory to think about that directly addresses that. For stock market risks, please see http://statisticalideas.blogspot.com/2014/01/market-decline-statistics.html and http://statisticalideas.blogspot.com/2013/09/tail-risk-have-we-met.html

      Delete
  2. schrodinger was applying the binomial distribution... if you have bath with N particles and draw a volume of water with an expected amount n, then the standard deviation of the distribution over the amount drawn is sqrt(n). To see this, we are interested in the binomial distribution for N trials and probability p = n/N. The st. dev. is sqrt(Np(1-p))= sqrt(n(1-p)). If p is considered small since n << N, then st. dev is sqrt(n).

    ReplyDelete
    Replies
    1. this is not a negation of our detailed analysis.

      Delete