Author Malcolm Gladwell, knits together an entertaining
book "David and Goliath".
In this recent book, Malcolm discusses empirical studies that leave us with the impression that underdogs have greater advantages than we normally
assume. Reflectively, the assumed
powerful instead have larger weaknesses than we often estimate. An example of the book’s narrative is
found in the provocative sub-title of one of his chapters, “You wouldn’t
wish dyslexia on your child. Or
would you?”
There are a few underlying statistical themes, which
underlie the theories that Malcolm proposes, and the theories behind those ideas
are expounded upon within this blog.
The most notable concepts are that of non-linear
relationships, regression
to the mean, and variance
of means. One particular study
inside the book doesn’t have the statistics rigor we generally expect from the
cited dataset. The dataset is on the
publishing performance of new economists from various ranked graduate programs.
The underlying paper from
by John Conley and Ali Sina Önder, has three significant imperfections. The first imperfection is that they do
not capture -ex ante- PhD candidate’s individual rank within his or her
program, but rather they ex post substitute publishing performance for this
rank. So the paper seeks to assess
the potential of candidates as they enter the job market, but instead looks
only at their performance six years after they’ve already graduated. The second imperfection is that they take each individual school's list of candidates, sort them on publication output, and then say that rank matters. This doesn't make sense, since the process of list-wise ranking already implies forcing individual publication rank to be the dependent variable instead of the explanatory variable. And now the third imperfection, which Malcolm then shares, is in their statistical interpretation of the overall results. Here they conclude that publishing
performance diminish somewhat less based upon the department rank, and more so
based upon candidate’s individual rank within his or her department.
Malcolm takes the underlying paper, which lacks the relevant
causal relation from an individual’s rank at graduation, to that individual’s publishing
performance. Then without the data
to support this, he doubles-down on this lack of causality. He does this by suggesting that entrants
to a top-tier program could provoke a higher graduation rank and therefore higher
publishing performance, by just matriculating at a lower-tier program instead. Additionally, Malcolm selectively
chooses a small subset of all of the ranked data in the paper, to illustrate
this point in the book. As a
result there is some sampling bias at play, as well.
Now let’s analyze the actual descriptive data, using
advanced statistical tools. And in
doing so we’ll clarify why some of the conclusions from the study, and in Malcolm’s
book, do not hold up to as much statistical rigor as they have been purported to have.
The underlying data comprises of the top 30 ranked economics
programs, in addition to a composite of the next 30 (non-top 30) ranked
economics programs. For each
program the study show just the distribution of American Economic Review (AER) equivalent publication
counts, from the 40th percentile to the 99th percentile. So let’s focus on the top ¼ of the top
30 ranked, and then after a brief charting chasm we show non-top 30 composite so we see what is the dynamic that is
being discussed.
The idea of the Conley and Önder paper, and also in Malcolm’s book, is that
outside the highest fraction of performers in these elite eight schools, the next
best performers are at the high end of the non-top 30 program universe. To demonstrate this, they focus
exclusively on the 99th percentile performer(s) at non-top 30, which
have an AER performance of 1.05 publications. And we can see elsewhere on the chart for the other schools the sea of blue
represents the distribution of performers publishing 1.00 or fewer
publications. So all good? Are you convinced that the middling
candidates of these elite eight programs will fare worse than the better candidates
of the non-top 30 programs?
Malcolm uses the metaphor that it is simply better to choose being a big
fish in a little pond (e.g., David), versus a little fish in a big pond (e.g.,
Goliath).
Regrettably the dangerous aspects of this sort of analysis they've performed is that it is
subject to two highly variable risks.
The first risk is that the sample size could be too low for distribution
comparisons, particularly when comparing a class of institutions against a single
composite. The second risk is that
we are likely going to be falsely comparing the expected results from the
extreme outliers of the distribution (e.g., top 99th percentile)
versus a more reliable middle part of the distribution. In any given year, there can be a
blow-out candidate who surprises by a random amount. We should never base large lifestyle decisions based assuming
we will enjoy one of these erratic outlier performances from within a weaker
composite. This is the same logic
that could be used for a mid-level government employee from a large state, to
simply move to a smaller state since then they can be governor, a position with
enormous rewards. This sort of
logic makes no sense, but it’s the motivating force behind the paper, and in
Malcolm’s book.
Now let’s look more carefully at this data shown above,
except excluding the top fraction of candidates from those elite eight schools. To do this, in the chart below we
re-illustrate the data, through only the 90th percentile. This is because being a big fish in a
little pond doesn’t mean one is the biggest fish, while everyone else are
little fish. Being a big fish instead
means being somewhere above the middle rank, while being a little fish means
being somewhere below the middle rank.
This is particularly true to the PhD process for these top 30 schools,
which are a bit random and offer large overlaps in the quality of applicants at
the low end of the top 30 and at the upper end of the non-top 30 schools. Therefore a large spread in distribution
performance among “tiers” is needed to make significant statistical sense. We’ll discuss tiers further below, as
we’ll use a specific three-tier system to analyze the data here.
The 90th percentile performer in a non-top 30 program
generates 0.12 publications (far less than the same composite’s 99th
percentile performer with 1.05 publications). This is shown in orange.
Still, 0.12 publications would be ranked in the low-60’s percentile at
the elite eight schools. So we
have a 28% non-parametric distribution difference between candidates at this
conditioned elite eight program versus those anywhere in the non-top 30
programs (90%-62%).
And below, we show the same chart, except we redistribute
the candidates from the elite eight schools after we remove those in the top
10% (truncating those ranked above 90th percentile). Here we see 0.12 publications would be
a mid-60’s percentile of the remaining elite eight candidates, after we remove
the top 10%.
Our non-parametric approach must be used since the cohort
sizes are uneven. And the sample
size risk here is mitigated since each of the top eight schools have a cohort
group of at least 10 students for the nearly 1.5 decades study. For the non-top 30 program, the cohort size
is 17. So the 28% difference is statistically
significant (more than a 10% difference would be needed for a 5% significance),
except that the difference is in the wrong direction!
For statistical significance in our case, we would need for
an easy improvement in publication quantity from only a slight alteration in
individual rank (this is a study found after all in a book named after the conclusive battle between David and Goliath!) Also here we
require much more significant proof of easy improvement in individual rank just
to match the same publication level achieved at the elite eight schools. Lastly, for our statistical test we are
aiming for this comparative impact to be closer to the median of the
distribution when contrasting fishes and ponds. Right now the 0.12 publications focuses instead near the 3rd
quartile (average of 62nd percentile and 90th
percentile).
So we can simultaneously satisfy both of the requirements above by
revisiting the first chart above, and we see that getting 0.01 publications is
77th percentile of the truncated candidates in the elite eight
schools, versus 43rd percentile for the non-top 30 programs. At a 24% difference, we are slightly
closer, but still very far from showing that there is statistical significance
of the big pond versus little pond outcome differences.
Now instead of further refuting the results from the
original study, and also from in Malcolm’s book, we change gears to discuss
what are the interesting take-aways that a statistician would have when looking
at this data? It is that academic
success is monopolized by the top cohorts of the elite quarter of the top 30
departments. These eight programs
take essentially the same students and can be thought of as a single proxy tier
(e.g., Tier 1). The remaining 22
schools in the top 30 can be Tier 2, and the non-top 30 departments can be Tier
3. Performance outside of the top
fraction of the Tier 1 drops quickly, and individual rank in the department at
that point accounts for nothing of significance.
We should note that in "David and Goliath", the little pond
proxy was the non-top 30 departments, and then Malcolm happens to append to
that both Toronto (rank 24) and Boston University (rank 28). But the 99th percentile for
the non-top 30 composite is a low 1.05, and the average for Toronto and Boston
University is 2.4. Even the other schools
ranked 20 through 30 had a 99th percentile performance of 1.9. So Malcolm happens to have selected,
without disclosing this gap in the book, a biased sample for the little pond that
well fits his theory of the 99th percentile performers there are very
strong.
Now let’s look at this distribution plot below for the top
99th percentile publication performance along the three-tier system we
described above. It shows that there
is strong statistical difference at the top end of Tier 1 schools. These are the biggest of all fishes in
each of the ponds (big, medium, small, respectively).
Now let’s drop down the 70th percentile and we
can see the difference between the ideas in the paper’s and in Malcolm’s book,
and the hypothesis put forward here from a statistics perspective. We see that Tier 2 and Tier 3 performance
is not statistically different. Strange
isn’t it that the big fish in the medium pond is not statistically
outperforming the big fish in the little pond? After all, we all saw that the big fish in the big pond is
still outperforming the big fish in the medium pond.
For this analysis we must look at the analysis of variance test to assess the difference among Tiers, while also accounting for the cohort
sizes of each department’s contribution to it’s own Tier proxy. The results become less significant at
lower individual ranks, disproving a relationship of individual rank (beyond
being a top performer at an elite school) in accounting for publishing success. One can see the F-statistic drops as we cascade down from 99th à
70th percentile à
40th percentile.
Without regard to sample size, the F-statistics are 8.4,
1.1, and 0.4, respectively. And taking
frequency counts into consideration, the F-statistics are 192, 46, and 2,
respectively (the distribution plots above for these would look similar to those shown above). We end by showing the 40th
percentile difference in central tendency test, below, as an example.
. oneway p40 Tier
Analysis of Variance
Source
SS df
MS F Prob
> F
------------------------------------------------------------------------
Between groups .001788365 2 .000894183 2.28 0.1038
Within groups .192116317 489 .000392876
------------------------------------------------------------------------
Total .193904683 491 .000394918
Bartlett's test for equal variances: chi2(1) = 299.9994
Prob>chi2 = 0.000
note: Bartlett's test performed on cells with positive variance:
1
multiple-observation cells not used





No comments:
Post a Comment