Statistical Ideas: 2016 polls: forgotten lessons

Everyone recognizes that the 2016 election polls were chaotic. You can't put a positive spin on fake 90% “probability”, with a straight face. But to save face, the forecasters behind those polls have certainly tried! Meanwhile in the months leading up to the 2016 election, we had correctly reasoned that Hillary Clinton had closer to only a 50% probability. The gulf between what the mainstream news pushed out, and our reality, was indeed that wide.

Interestingly, all pollsters from back then are still in force. Same work this go around. While they should be, they are still not much different. Polls currently indicate that the Democrats have a super-high 80%-85% “probability” of overtaking the House. To some, this seems nearly as (insanely) high as the fake 90s% “probability” we just noted that they were being told in 2016. Is it still that a sure thing, this time around? We will briefly discuss below that we instead see the Democrats with only a 55%-60% probability. Indeed a gulf still exists, but obviously it is not as exuberant as was the case in 2016.

Now for ground rules, let's reiterate that our website has never been politically biased. We care about people with differing views, and our sole intention if to focus on the probability theory concepts. With that, let’s discuss three major math themes that are worth reinforcing this go around, for the 2018 midterm forecasts.

The first theme is that the major pollsters are using a highly limited small sample size of longitudinal polls in order to formulate estimates of their errors. They also assume a nice clean, normal distribution even though their limited data is noisy. The result of these issues is that pollsters (such as competing statistician Nate Silver) continuously give absurdly high "probabilities" for outcomes that then consistently fail to materialize. Sometimes wildly hedging in the final minutes of election night. Here is a reminder of some of his many high-profile and failed forecasts, each with very high "probabilities" stated for occurring:

· 2015, 75% probability on United Kingdom election

· 2016, Donald Trump at 98% to lose GOP primaries

· 2016, Donald Trump at >90% in Alaska primary

· 2016, Hillary Clinton at >90% in Michigan primary

· 2016, Hillary Clinton at >90% in Indiana primary

· 2016, Hillary Clinton at 90% in Wisconsin primary

· 2016, Hillary Clinton at >70% in general election

· 2017, 75% probability on Alabama senate election

Don't worry; for every high “probability” call that you can pick up that he has gotten right. We can provide at least one high "probability" call that he instead got wrong. More importantly, the Bayesian adjustment for future predictions need to be tamped down far more than usual from the 80+% "probability" currently still being showcased for this week (for Democrats taking control of the House, or Republicans remaining in control in the Senate).

And this brings us to our second theme, which is that pollsters are still stating too low a self-assessed “margin of error”. See this chart below.

Nassim Taleb in 2016 pointed this out as well:

What is alarming isn't that they were wrong but that these idiots underestimated their error rate.
Similar to p-valhttps://t.co/NqbKZhqhoG https://t.co/UqjmzraaeV
— Nassim Nicholas Taleb (@nntaleb) November 10, 2016

Even Nate Silver aches from these polling “probability” auto-variances as well. For example, in a tiny time span, we have had to shake our heads as his “probability” (in one of his three various polling flavors that's allegedly smoother) confusingly oscillate roundtrips from 77%, down to 70%, up to 84%, and back down below 70%, and back up to over 84%! Every "probability" gets a shot to ex-post be mined as the winning focus, including erratic gyrations during election night itself.

That's how insanity works; not probability. And we noted ex-ante to this 2018 polling season, that this would be among the fingerprints remaining on his flawed polling logic.

So our first two themes we described above combine to show the >80% “probability” (that's popularly said for Democrats to take over Congress in 2018) is way too high. Directionally it's correct, but our estimate of <60% is far more realistic.

Our third theme was the problematic "transmission" of the polling data to the overall election outcomes, which is inherently more difficult in this mid-term, versus 2016 and 2014 elections. Why wouldn't current pollsters recognize that? Mid-term polls are smaller, noisier, must work their way through a small number of competitive seats, etc. Those relationships are not as tight as in general elections, which themselves proved onerously difficult in the current Trump-era for these modern pollsters to get ahead of.

We leave you with this thought-enriching poll below! Last, please be kind to each other. All of our dreams are interconnected. Looking forward to catching up, after Election Day!

Which model would best predict the midterm elections?
Model A: A coin toss
Model B: The opposite of whatever Nate Silver says
— Statistical Ideas (@salilstatistics) July 30, 2018

Statistical Ideas

Pages

Friday, November 2, 2018

2016 polls: forgotten lessons

2 comments: