Statistical Ideas: Antagonism isn't perpetual

Short-term update: Top article (literally) on reddit, and Zero Hedge. Our two articles this week will clear 1/2 million reads and shared by thousands (exceptional for any news website). Thanks for enjoying!

If you recently glanced at the polls and the election markets, then you would be forgiven to believe that a landslide election is looming. It's likely not (the opinion of even Nate Silver's 538), and the spreads have the potential to revert in surprising ways between now and Election Day. The drumbeat of negative news against Donald Trump may not cause further damage. We've discussed numerously, starting on October 11 and October 12, that Hillary Clinton's runaway spread would revert (here, here, here, here). Of course that's a stand taken against a popular headwind, but also an opportunity to make money on an election bet that is mispriced. For example, when we wrote the reversion article, the betfair ask that Mr. Trump's popular vote could remain in the 40's% was only priced at 1:6 odds. The 538 site also reflected this, as shown below (and still ascending through October 16 to now a >40% reversion!) But we -and other academic statisticians- knew that this was faux election probability, and advised hundreds of thousands to remain vigilant against planned mainstream misinformation. Incidentally, today's betfair bid is 25% higher; not many investments have risen 25% in just the past few days. And the wager could explode to 600% profit, exposing how steeply deluded the polls have been. This article isn't merely about gambling, but goes to the heart of what makes polls different among one another, and across time. And what should we be cautious of when interpreting the information, while almost never reading (and sometimes not having access to) all the underlying probability details of the poll generation? In particular, we'll delve into the inconspicuous L.A. Times poll here, where for much of the past month they showed Donald leading Hillary. How did they come to that, and what value is there in paying attention to alleged outliers?

Recently the New York Times (NYT) wrote a piece that the USC/L.A. Times (LAT) poll was biased against Hillary Clinton by at least 4 percentage points, through the exaggerated sampling of one Black Chicago youth. The NYT thesis for sampling issues was not based on general theory at all, but only because the survey respondent was a feverish Donald Trump supporter. Apparently the LAT has always been a good pollster, until this one Black man became a Trump supporter. Now the LAT poll is suddenly comprehensively terrible. Right... The NYT was both smart and correct in pointing out the seeming anomaly, but also misdiagnosed the root cause of the puzzle.

The LAT should retain their entire sample, and not simply alter responses because the pollster doesn't like what he or she hears. Removing select responses has that same effect, and this is partly why mainstream pollsters have systematically unfavored Republicans in nearly 2/3 of elections in the past several decades, where there have been a meaningful surprise in the general election outcomes. And in every case where such a reversal of fate has led to an actual victory for the October polling-laggard, it was always a Republican who won. This should give everyone pause to consider the strength of these "scientific" polls. We can often see something be misrepresented, yet be masqueraded as disciplined science.

Now the LAT pollster allows for some interesting statistical features that are not in other polls (many of which follow our blog). For example, it allows the survey participant to partially self-weigh their own response, and factors in his or her own prior voting record. These are worthy developments in most cases, including the case here of the 19-year old Black Trump supporter. Polling has to fill in a lot of gaps, particularly this year where there are a greater than normal number of undecideds and non-responders. This increases the error, not lessens it (per our viral article here read by >150 thousands including senior advisers of both parties). And the fact that most other polls do not scale their survey responders accordingly, equally leads to a higher than expected favorability (based only on momentum) for those who for now agree with Ms. Clinton more so than Mr. Trump. Of course we know across all polls this year there is a perception that Hillary has an polling edge when it comes to "perceived" favorability or social desirability (it's been noted that 10-15% of people have lost a friend due to the 2016 election); though this conflates with the overall bias going back many decades and so it's unclear how much additional bias comes from that. But the NYT overestimates the overall edge that the LAT has if this one Black youth is completely off in his responses; it is only about 1-2 percentage points. Not enough to close the nearly 5-10 percentage point difference the LAT has with the rest of the mainstream polls. The NYT is correct that the overweighting by LAT may exist however, in that this one individual is weighted a little more relative to the typical person. But this does not negate the data point altogether. Does anyone credibly believe that not a single Black person is going to vote for Donald Trump?

The bottom line is that polls on the fringes (e.g., the LAT and to a lesser degree only the trends in the conservative-advocating Rasmussen both showing Mr. Trump leading for much of the past month) should be taken a little more seriously due to the informative value they provide in how the many undecideds and non-responders will ultimately vote. In historical polling -and more so this year- data people tend to make up their mind for candidates as they approach the booth, and rarely does it lead to further subtractions from current polling levels. It is doubtful therefore that somehow any new negative information about Donald would compel someone, at long last in these final weeks, to ultimately switch allegiances. And while the theory of poll of polls works great to reduce the variance of errors, it does nothing to counter any systematic errors we may see hurtling through in the current election cycle. This is a significant lesson that remains lost among political hacks keen to simply analyze the data.

Another note is that you should be wary of taking too seriously the political advice of people who so recently badly errored in the Primary elections! This is not to cast a spotlight on any one individual, since the entire field of data journalism just saw a catastrophic result over the past year. But it's clear from the polling and the prediction betting market levels that the grave lessons from the past have not yet been learned. This summer's Brexit vote was just another example of election-eve overconfidence by pollsters and bookies. But stateside we do see the promotion of false confidence on preposterous polling statistics. The media ratings pursuit must inherent some blame, since news demands easily digestible insight that crookedly beguiles their patrons. And if we expose the overshadowing uncertainty surrounding these election predictions, then no one would venture into paying further attention. Even more reason for you to pay some attention to the outlier polls, especially this year!

October 16 update: data continues to move in direction of our statistical analysis.

7 comments:

AnonymousOctober 16, 2016 at 12:11 AM
I tend to wonder whether the assumption of good faith practice in all this polling isn't itself a reflection of bias. It was reported that the reason why the major bookies in the Brexit vote (Paddypower, etc.) had the odds the way they did was that several big-money bets were made against Brexit which affected the odds. One could legitimately ask whether such risky bets were made based on sheer confidence or whether they were placed to affect the odds and influence public perception.

The same with published polls. Obviously there's a feedback loop that exists between poll results and public support. If a member of the public thinks that 'nobody' is supporting Candidate A, that person may be influenced enough to withdraw support or simply not vote. I tend to think that media outlets responsible for financing these polls and then reporting the results are likely to bear this in mind, and this affects both poll methodology and subsequent reporting of the results. ITM.
AnonymousOctober 16, 2016 at 7:44 AM
We shall see what happens. Romney supporters in 2012 were "unskewing" the polls which allegedly oversampled Democrats. But we all know how that turned out. I hope Trump wins, but don't encourage people to bet money on an outcome that may very well be quite unlikely.
LWoodOctober 16, 2016 at 1:19 PM
Pretty sure they didnt see the Reagan landslide coming either.
UnknownOctober 18, 2016 at 6:55 AM
Thanks L Garou. You are right. Let's see what happens by the end of this week. Undecideds seem to be coming to a decision quickly, if you have checked out the polls in recent days.

Statistical Ideas

Pages

Friday, October 14, 2016

Antagonism isn't perpetual

7 comments: