Pages

Friday, January 3, 2014

David and Goliath in PhD programs


Author Malcolm Gladwell, knits together an entertaining book "David and Goliath".  In this recent book, Malcolm discusses empirical studies that leave us with the impression that underdogs have greater advantages than we normally assume.  Reflectively, the assumed powerful instead have larger weaknesses than we often estimate.  An example of the book’s narrative is found in the provocative sub-title of one of his chapters, “You wouldn’t wish dyslexia on your child.  Or would you?”

There are a few underlying statistical themes, which underlie the theories that Malcolm proposes, and the theories behind those ideas are expounded upon within this blog.  The most notable concepts are that of non-linear relationships, regression to the mean, and variance of means.  One particular study inside the book doesn’t have the statistics rigor we generally expect from the cited dataset.  The dataset is on the publishing performance of new economists from various ranked graduate programs. 

The underlying paper from by John Conley and Ali Sina Önder, has three significant imperfections.  The first imperfection is that they do not capture -ex ante- PhD candidate’s individual rank within his or her program, but rather they ex post substitute publishing performance for this rank.  So the paper seeks to assess the potential of candidates as they enter the job market, but instead looks only at their performance six years after they’ve already graduated.  The second imperfection is that they take each individual school's list of candidates, sort them on publication output, and then say that rank matters.  This doesn't make sense, since the process of list-wise ranking already implies forcing individual publication rank to be the dependent variable instead of the explanatory variable.  And now the third imperfection, which Malcolm then shares, is in their statistical interpretation of the overall results.  Here they conclude that publishing performance diminish somewhat less based upon the department rank, and more so based upon candidate’s individual rank within his or her department.

Malcolm takes the underlying paper, which lacks the relevant causal relation from an individual’s rank at graduation, to that individual’s publishing performance.  Then without the data to support this, he doubles-down on this lack of causality.  He does this by suggesting that entrants to a top-tier program could provoke a higher graduation rank and therefore higher publishing performance, by just matriculating at a lower-tier program instead.  Additionally, Malcolm selectively chooses a small subset of all of the ranked data in the paper, to illustrate this point in the book.  As a result there is some sampling bias at play, as well.

Now let’s analyze the actual descriptive data, using advanced statistical tools.  And in doing so we’ll clarify why some of the conclusions from the study, and in Malcolm’s book, do not hold up to as much statistical rigor as they have been purported to have.

The underlying data comprises of the top 30 ranked economics programs, in addition to a composite of the next 30 (non-top 30) ranked economics programs.  For each program the study show just the distribution of American Economic Review (AER) equivalent publication counts, from the 40th percentile to the 99th percentile.  So let’s focus on the top ¼ of the top 30 ranked, and then after a brief charting chasm we show non-top 30 composite so we see what is the dynamic that is being discussed.
  

The idea of the Conley and Önder paper, and also in Malcolm’s book, is that outside the highest fraction of performers in these elite eight schools, the next best performers are at the high end of the non-top 30 program universe.  To demonstrate this, they focus exclusively on the 99th percentile performer(s) at non-top 30, which have an AER performance of 1.05 publications.  And we can see elsewhere on the chart for the other schools the sea of blue represents the distribution of performers publishing 1.00 or fewer publications.  So all good?  Are you convinced that the middling candidates of these elite eight programs will fare worse than the better candidates of the non-top 30 programs?  Malcolm uses the metaphor that it is simply better to choose being a big fish in a little pond (e.g., David), versus a little fish in a big pond (e.g., Goliath).

Regrettably the dangerous aspects of this sort of analysis they've performed is that it is subject to two highly variable risks.  The first risk is that the sample size could be too low for distribution comparisons, particularly when comparing a class of institutions against a single composite.  The second risk is that we are likely going to be falsely comparing the expected results from the extreme outliers of the distribution (e.g., top 99th percentile) versus a more reliable middle part of the distribution.  In any given year, there can be a blow-out candidate who surprises by a random amount.  We should never base large lifestyle decisions based assuming we will enjoy one of these erratic outlier performances from within a weaker composite.  This is the same logic that could be used for a mid-level government employee from a large state, to simply move to a smaller state since then they can be governor, a position with enormous rewards.  This sort of logic makes no sense, but it’s the motivating force behind the paper, and in Malcolm’s book.

Now let’s look more carefully at this data shown above, except excluding the top fraction of candidates from those elite eight schools.  To do this, in the chart below we re-illustrate the data, through only the 90th percentile.  This is because being a big fish in a little pond doesn’t mean one is the biggest fish, while everyone else are little fish.  Being a big fish instead means being somewhere above the middle rank, while being a little fish means being somewhere below the middle rank.  This is particularly true to the PhD process for these top 30 schools, which are a bit random and offer large overlaps in the quality of applicants at the low end of the top 30 and at the upper end of the non-top 30 schools.  Therefore a large spread in distribution performance among “tiers” is needed to make significant statistical sense.  We’ll discuss tiers further below, as we’ll use a specific three-tier system to analyze the data here.

The 90th percentile performer in a non-top 30 program generates 0.12 publications (far less than the same composite’s 99th percentile performer with 1.05 publications).  This is shown in orange.  Still, 0.12 publications would be ranked in the low-60’s percentile at the elite eight schools.  So we have a 28% non-parametric distribution difference between candidates at this conditioned elite eight program versus those anywhere in the non-top 30 programs (90%-62%). 


And below, we show the same chart, except we redistribute the candidates from the elite eight schools after we remove those in the top 10% (truncating those ranked above 90th percentile).  Here we see 0.12 publications would be a mid-60’s percentile of the remaining elite eight candidates, after we remove the top 10%.


Our non-parametric approach must be used since the cohort sizes are uneven.  And the sample size risk here is mitigated since each of the top eight schools have a cohort group of at least 10 students for the nearly 1.5 decades study.  For the non-top 30 program, the cohort size is 17.  So the 28% difference is statistically significant (more than a 10% difference would be needed for a 5% significance), except that the difference is in the wrong direction! 

For statistical significance in our case, we would need for an easy improvement in publication quantity from only a slight alteration in individual rank (this is a study found after all in a book named after the conclusive battle between David and Goliath!)  Also here we require much more significant proof of easy improvement in individual rank just to match the same publication level achieved at the elite eight schools.  Lastly, for our statistical test we are aiming for this comparative impact to be closer to the median of the distribution when contrasting fishes and ponds.  Right now the 0.12 publications focuses instead near the 3rd quartile (average of 62nd percentile and 90th percentile). 

So we can simultaneously satisfy both of the requirements above by revisiting the first chart above, and we see that getting 0.01 publications is 77th percentile of the truncated candidates in the elite eight schools, versus 43rd percentile for the non-top 30 programs.  At a 24% difference, we are slightly closer, but still very far from showing that there is statistical significance of the big pond versus little pond outcome differences.

Now instead of further refuting the results from the original study, and also from in Malcolm’s book, we change gears to discuss what are the interesting take-aways that a statistician would have when looking at this data?  It is that academic success is monopolized by the top cohorts of the elite quarter of the top 30 departments.  These eight programs take essentially the same students and can be thought of as a single proxy tier (e.g., Tier 1).  The remaining 22 schools in the top 30 can be Tier 2, and the non-top 30 departments can be Tier 3.  Performance outside of the top fraction of the Tier 1 drops quickly, and individual rank in the department at that point accounts for nothing of significance.

We should note that in "David and Goliath", the little pond proxy was the non-top 30 departments, and then Malcolm happens to append to that both Toronto (rank 24) and Boston University (rank 28).  But the 99th percentile for the non-top 30 composite is a low 1.05, and the average for Toronto and Boston University is 2.4.  Even the other schools ranked 20 through 30 had a 99th percentile performance of 1.9.  So Malcolm happens to have selected, without disclosing this gap in the book, a biased sample for the little pond that well fits his theory of the 99th percentile performers there are very strong.

Now let’s look at this distribution plot below for the top 99th percentile publication performance along the three-tier system we described above.  It shows that there is strong statistical difference at the top end of Tier 1 schools.  These are the biggest of all fishes in each of the ponds (big, medium, small, respectively). 


Now let’s drop down the 70th percentile and we can see the difference between the ideas in the paper’s and in Malcolm’s book, and the hypothesis put forward here from a statistics perspective.  We see that Tier 2 and Tier 3 performance is not statistically different.  Strange isn’t it that the big fish in the medium pond is not statistically outperforming the big fish in the little pond?  After all, we all saw that the big fish in the big pond is still outperforming the big fish in the medium pond.


For this analysis we must look at the analysis of variance test to assess the difference among Tiers, while also accounting for the cohort sizes of each department’s contribution to it’s own Tier proxy.  The results become less significant at lower individual ranks, disproving a relationship of individual rank (beyond being a top performer at an elite school) in accounting for publishing success.  One can see the F-statistic drops as we cascade down from 99th à 70th percentile à 40th percentile.  Without regard to sample size, the F-statistics are 8.4, 1.1, and 0.4, respectively.  And taking frequency counts into consideration, the F-statistics are 192, 46, and 2, respectively (the distribution plots above for these would look similar to those shown above).  We end by showing the 40th percentile difference in central tendency test, below, as an example.

. oneway p40 Tier
                        Analysis of Variance
    Source              SS                     df      MS                   F     Prob > F
------------------------------------------------------------------------
Between groups    .001788365      2     .000894183      2.28     0.1038
 Within groups      .192116317    489   .000392876
------------------------------------------------------------------------
    Total                  .193904683    491   .000394918

Bartlett's test for equal variances:  chi2(1) = 299.9994  Prob>chi2 = 0.000

note: Bartlett's test performed on cells with positive variance:
      1 multiple-observation cells not used

No comments:

Post a Comment