Statistical Ideas: June 2017

Saturday, June 24, 2017

Blind leading the blind

In his latest debacle, Nate Silver was among many pollsters/pundits who in 2016 forecasted a super-high probability for a Hillary Clinton win (after giving Donald Trump only 2% probability to make it through his primaries). This high “probability” was of course mistaken, and sets an extraordinarily high bar for actual delivery of outcomes. And he was in the high 80% range, for a couple months prior to the election. These high probabilities are nothing new to Nate Silver, but also provides an opportunity to examine how poor polling works and how we might be better served ignoring them and instead listening to one another. No one wins all of the time. And one would have to conclude -based on Nate Silver’s statistics- that he would be correct ~85% of the time, instead of really being incorrect 85% of the time! Looking below the hood of his national forecast, we can see his more disastrous his state-by-state analysis which people enjoyed but fed into this his poor overall forecast. Now keep in mind that one is evaluated -as a forecaster- on actually forecasting better than the average bloke. For example, everyone knows that a best-seed basketball team will beat a worst-seed basketball team in the NCAA March Madness. So if everyone guesses that, and if everyone is then right, that doesn’t mean everyone is an above-average mastermind!

What would be a good baseline for presidential elections? The “dumb forecaster” concept in journal literature from the American Statistical Association, where I have served on their editorial panel, is simply guessing the same election results as what just happened in the previous election. No more, no less. Put differently, one doesn’t need to think any further than to state that all the 2016 state’s electoral results will be the same as in 2012!

And yet such a dummy would have gotten 44 of the 50 states correct. The 6 wrong states by the dummy are obviously the 6 states that flipped (notably they all flipped for Donald Trump). Most pundits, including our parents and children, if asked to guess at the electoral map only using past election outcomes and no polling "insight" or punditry, would have also averaged about 44 states correct. That’s 88%: so we might think we’re virtuosi, but we’re just all doing the same as a dummy who also got 44 states correct without a single thought about this election.

What’s worse is that all of the infrastructure backing Nate Silver led him to only get less than 40 states forecasted correctly! Statistically a lot worse than a dummy, and completely exposing multiple issues with his models. And the 6 states that flipped, he was correct on only 2, almost completely accounting for his catastrophic breakdown. Over a handful of other states, he too had incorrectly flipped for no reason, and each time was wrong and weakened his perceived forecasting shrewdness.

Let’s peer into the 6 states that flipped, and Nate Silver’s analysis running into the election:

State	Electorals (Nate Silver wrong or correct)	Nate Silver’s probability Clinton winning state	Nate Silver’s probability said state could tip election
Florida	29 wrong	51%	17%
Pennsylvania	20 wrong	76%	11%
Michigan	16 wrong	79%	11%
Wisconsin	10 wrong	82%	4%
Ohio	18 correct	65%	7%
Iowa	6 correct	73%	1%

Nate Silver generally had a 72% probability of being correct in each of the first four states above (totaling a monstrous 75 electoral votes, and each one he got wrong). And on more than a handful other states that he incorrectly flipped, he gave each one of those about a 75% probability each, of being correct. But his overall election probability was lower, at 71%. This is mathematically inconsistent. Through the central limit theorem adopted to summing a variable series (see page 18), the overall forecast must adopt a lower uncertainty and therefore a more robust probability. Professor Nassim Taleb exposed yet another flaw when we fit a stochastic model to election probabilities and suggests that -with much higher than recognized- forecasting uncertainty (by Nate Silver and many others), the election probabilities (assuming the surveys were valid and unbiased to begin with) should have been essentially tamped down to coin flips in the many months before the election (something we stated as well).

So now we know these probabilities were totally off the map, we can also see what does this mean for someone utilizing these forecasts? We’ve also presented in the rightmost column the chance, that a co-variant polling shift wouldn’t catch all these states at the same time. To be a tipping state, implies that this is the most marginal surprise of the election. Nate Silver got four tipping states wrong, and they all had large electoral votes which mattered. He also was incorrect when he confidently swaggered that the probability of those same four states mattering were each between just 4%, to 17%. So we had a rare 4% probability event, combined with a rare 17% probability event, etc. This shows an extreme degree of confidence that each state's forecast was in fact independent of on another, when as we saw with the entire missing of the Donald Trump movement, Nate Silver's models were overly reliant on Hillary Clinton winning to begin with and any Donald Trump surprises in one location would be countered by Hillary Clinton (it wasn't) in another. So we see from these low 4%-17% probabilities, that there is a preposterous assertion that Mr. Silver's election outcome was this extreme of a fluke, and nothing more for anyone to learn. After all, this pollster has a live track record for one decade, tops. The correct analysis instead is that Nate Silver’s models are wholly inconsistent between his national polls and the his state-polls, and are poor replicas of what’s happening on the landscape, and dummies simply taking guesses are correct more often. This is also not a popular vote issue as the many states Nate Silver got wrong, Hillary Clinton lost the popular vote in addition to the electoral vote for those states (there is no mathematical defense for over-sampling and over-campaigning in California).

Now we got to this point when we have poorly constructed samples. This has been happening for far too long, and usually no one notices because someone may be close to the right answer. Including a dummy. Rasmussen Polling is a great example in this case, with a poor forecast for the 2016 election (albeit correct by chance alone), and suddenly their 50%ish current job approval rating seems to have outsized credibility. The second gap is the mapping between the surveys, to what matters in the booth. People consistently saying one thing, but thinking/doing another. There is not enough sampling to overcome some things, including those who don't like to be sampled. We can also have for example a liberal pollster, paid for by a liberal media organization (or showcased in recent contender Jon Ossoff's campaign e-mails!), and then it’s very easy to see how the cast of important assumptions of how they survey people’s responses will suddenly be completely skewed in one direction. Without it that liberal pollster would get no attention. Take care. This whole game leaves a public that is later bewildered, having assumed one version of the story was far more appealing and righteous than it actually was. Such as this partisan, fat-tail devotee, below.

attacks on congress = ratio/rate as on public, though these mortality-cause contrasts are dumb. 1954 capital incident wounded 5. cc @nntaleb pic.twitter.com/IKg5LmeH64

— Statistical Ideas (@salilstatistics) June 16, 2017

Tuesday, June 13, 2017

Initial job approvals... inconsequential

We’ve all seen the job approval polls and at first blush they appear like a catastrophe. How did that happen, so quickly after President Trump was sworn in? Recall though that these are the same pollsters/clowns who wrongly gave Hillary Clinton (on Election Eve) a ~90% chance of winning. On the other hand, this site was among a rarest prominent voices to stick their reputation on the line to repeatedly forewarn that ALL the mainstream pollsters were wrong in what they claimed was a sure thing. Of course they are still wrong here, in how they are deceiving the American (and global) public that the initial job approval ratings mean anything relevant. Don’t fall for it, as you and Hillary Clinton fell for it last autumn. Pollsters peddle appalling work, while pretending to be scientifically accurate.

Independent voters haven’t suddenly -a day after the election- decided to disapprove of President Trump. They are far more mature. But we know exactly which political quadrant of the country is lashing out here. And even if they weren’t, the polls right now simply provide no insight and contain an embarrassingly 2x or 3x as high actual margin of error than what they presume. In our case the approval poll estimates should be biased significantly in favor of President Trump (closer to 50% actual job approval) solely based on the most unchanging current confidence interval.

The other issue is that the main attention for these polls is how it might eventually map to one’s 2020 re-election odds. Is President Trump a two-termer or not? And here the initial job approvals (unless they sink well below 30%) have little to do with re-election probabilities! Clearly the state of the economy in a couple years and the qualifications of whoever the emerging Democratic contender is, will jointly bear on the actual re-election probabilities. But they will also cause Americans then to reassess President Trump’s job performance accordingly.

So the main variable that matters, in the context of this job approvals article, is what the job approvals value is in a couple years. Don’t waste your time dreaming. But know that Presidents who start with low job approvals, as we have here, will generally see the benefits on mean-reversion. Surely there could be something appalling that President Trump does on his own to self-inflict a mortal wound in his own chances, but barring such ludicrousness, count him in as a strong contender for 2020.

It is imperative to this analysis to appreciate that continuously from President Roosevelt in the late-1930s, to the mid-1970s, presidents almost always become re-elected. Didn’t matter much about their actual performance! But progressively, since the 1970s, American voters have changed and are now much choosier. Surely they expect results, but they also give presidents a fair chance. Disregard the lunacy you might see at a weekend anti-Trump rally (or from Kathy Griffin), most Americans are giving the President a real chance and he will have some positive surprises as well (and some avoidable boo-boos).

But a probability model to predicting re-election based on the various information discussed so far, should incorporate both the job approval a couple years down the road (and not today’s in any way), as well as the recent several presidential relationships between these future job approvals and the re-election results.

Thankfully there were some recent losses from incumbents (most recently President Bush in 1992) to gauge the model strength. See the chart below for prediction of re-elections, using just the 3^rd year job approvals for the past dozen applicable presidents, and stretching back nearly 80 years. Note that for visual simplicity, data are randomly scattered above and below the blue-color model curve.

With President Trump ricocheting high 40s%, we might imply just greater than 50% or so chance of re-election (coincidentally again just better than the odds that established, gambling bookies have for him). But such a model isn’t the best-fit anyway. It has a R² (the entropy, or the generalized calculations) of only in the 0.2s (on a fit scale of 0, to 1).

The strongest version is to incorporate the political changes across the decades, of what these job approvals might suggest. There we get a much better fit (by a magnitude of 2), with a R² that is now in the high 0.8s! Additionally, the confusion matrix shown below provides an extraordinary degree of predictive accuracy (minimize all errors) with only one error in the past 12 elections! That’s a much better record than Nate Silver and the other fellows, who typically have had terrible election forecasting results.

	Predicted re-election win	Predicted re-election loss	Type-2 accuracy
Actual re-election win	9	0	100%
Actual re-election loss	1	2	67%
Signal accuracy	90%	100%	92%

July & October 2017 addendum:

The @DNC are most bloodthirtsy in 30 years. May not matter for 2020 (Bush had high #JobApproval). Just don't loathe. https://t.co/twcyNbkS3g pic.twitter.com/WVxQX4yv16
— Statistical Ideas (@salilstatistics) July 6, 2017

complex math modeling can hinder efforts to glean insights (e.g., seeing who gained from stock rally) https://t.co/jog661Z470 ht @ColinCBarr pic.twitter.com/jaz6O7VuQk
— Statistical Ideas (@salilstatistics) September 15, 2017

Friday, June 9, 2017

constant negative press covfefe

Our President essentially inhabits twitter. In his 8 years toying on this site, he has averaged nearly 20 tweets daily (a pace that only remains enlarged since the election)! As contrast, our Statistical Ideas site has been on twitter only a year, and tweets only twice daily. But if one chooses to tweet hourly, for weeks and months and years on end, then there are bound to be some mistakes (and perhaps egregious blunders) on the social media platform. Being under the magnifying optics of the Presidency doesn't help, as it leads both the minor and the sometimes foolish errors to quickly become exposed and exploited. There is no justification -aware to me- for anyone being on twitter so much, other than to pick childish fights with other citizens. However it is worth putting any of President Trump’s small number of spelling or grammar errors within the context of how much more often he tweets versus most other people. But then, at the end of last month, his now-deleted and mysterious tweet caught the social media universe by storm:

Despite the constant negative press covfefe

It wasn’t so obvious!

Where does one go from there? The President illuminati was of no aid, and instead playfully doubled-down this by suggesting we all discover what this word (that only President Trump knows) is all about. And so, ensued a wild chase that quickly led to three types of guesses. One guess was “coffee”, since phonetically the word covfefe seems similar to coffee, as absurd of a word guess as that would be in this context. The second guess(es) was simply any sort of disturbingly vulgar word that comes to mind. That's what we've come to, in a meme. Proudly and creatively constructed derivatives of “ask your doctor is covfefe is right for you”, or even more crudely “go covfefe yourself!” And a third popular guess was “coverage”. After all, this seems to conveniently flow with the narrative that some have always had of the President's "agenda". Of course, looking at various keyboards, and seeing how one could have mistyped “coverage” into “covfefe” requires a series of poorly placed digits on the keyboard. But of course, the President rarely uses this word, and never in connection to “press”. This is genuinely terrible speculation. Additionally, we don’t know what the context of the entire tweet is (what did he type before or want to continue typing after this tweet, since it was terminated mid-thought). So, enter Big Data statistical models that were developed to thoroughly investigate the background of this word “covfefe” and what are the likely different things that was meant in this now infamously coded tweet.

Auto-complete

The autocomplete feature of modern operating systems look at a small amount of information and derive "predictions" as to what one likely meant. Like a sorted market basket analysis, we know for example that someone who starts to type in “cof” if likely headed to typing in “coffee” or “coffers”. Covfefe was never a word before, so auto-complete would never have automatically guessed at it, and instead would have proposed alternates such as “coffee” or “could”. Now that the President used that word anyway, an imminent attempt to type in “cov” or “cof” would result in “covfefe” being suggested as a possibility. These auto-complete models also incorporate ideas about what might make sense for the constructed clause and the expected small amount of fat-fingering (which similar to if covfefe really meant coffee, and then one auto-completes to coffee, would imply a more robust fat-fingering dataset to work with in the model for a given operating system).

What flows

The use of the word “coverage” works with the idea that it flows from a bias of what some merely think President Trump meant, while simultaneously anchoring on the first few letter of “covfefe”. Falsely assuming the mistype is completely at the end of the word only. This is a common problem with human guesses, since most do not consider that the mistype could have been anywhere at the start [the “c” could have easily been a “d” or a (space)]. It should be noted that the President, when mistyping, generally does so in a predictable way. Sadly, also revealing outstandingly poor spelling skills that make Dan Quayle look like a prodigy. Of course spelling is not everything, and clearly not reflective of his outstanding marketing and real estate skills.

Word choice history

Another approach to this problem is to consider the words that President Trump uses and hasn't used. Analyzing this tweets (for the first time ever), reveals that while the President commonly uses the words “FAKE” and “media”, but rarely uses the word “press” and “coverage” (and essentially never in combination). Why should he have abruptly started here? This is part of the mystery.

A couple weeks prior to this incident, Harvard University released a study exposing negatively-biased media coverage of Donald Trump during the campaign. So, the expression “negative media coverage” was used quite a bit in the political and social landscape for the couple weeks before the “covfefe” tweet, except never by the President himself. And a day before, comedian Kathy Griffin displayed a decapitated image of the President and the word coverage doesn't seem to fit in what anything related to that.

Last, it should be noted that in Russian and Arabic languages, covfefe is an actual word. While charming for Trump supporters, it would be a strong anomaly for a president barely versed in the full range of the English language, to be suddenly flaunting foreign language skills.

Statistical matching

A more comprehensive mathematical approach is to instead of starting with the dictionary and imagining what President Trump meant, start with the keystrokes and see how much re-arranging is necessary to create any other word. Hundreds of billions of possible scrambles of the code!

Limited to 140 characters on twitter, it is often the case that one diverges from normal grammar, in order to jam a tweet together, and hence the operating system auto-complete model becomes very difficult to match in an environment where short-hand is often randomly interlaced. Instead we can look for a more robust class of working options, which takes a lot longer to solve for on the fly - if we were doing this while typing a tweet (but we solved it all here as a one-time mathematical exercise). There are a few different ways one can mistype a word and they each take different degrees of likely exertion. One can (in general order of commonality):
* mistype a letter either for another near it or similar looking (for example "v" instead of "u")
* or swap latters (one calling a country "Denmakr")
* accidentally not type a letter
* or mistype a letter with one far from it on the keyboard
* or add one letter
* or skip a preceding word to make it fit into twitter
* or finally double space (or other) to force a punctuation.

Further below we show the top combinations from the billions of possibilities of all of these faulty behaviors above. We essentially have reversed what “covfefe” could have been!

A final model

Combining the probabilistic rankings of all possible mistypes above, with words that the President commonly uses, and with what likely could have been constructed by him in this tweet, we see a rich portrait of potential word choices. They are shown below, in the basic order to chance.

Bear in mind that it is also highly plausible that we could have a corrected replacement word, but then need to instead change another word in the sentence/tweet altogether that is suddenly grammatically wrong. Here's our top choices of what the President likely was meaning instead of the word "covfefe".

Special update June 9: thanks for the many people who looked at various versions of this research (particularly Seth Hannah & Danny Frangiadakis), and thanks to Michael Shedlock for suggested edits to the graphic above (which now have been made.)

Friday, June 2, 2017

Disenchantment in jobs growth

The May 2017 labor report was 138,000 jobs gained (on the establishment survey, and -48,000 jobs lost on the household survey!) And while this may appear to some to be just a step down from normally great growth numbers, that line of thinking would only be recent-data bias kicking-in. The overall numbers have been smartly coming down for a few years now, and puts 3% or higher GDP at extraordinary risk. See the monthly raw data for the establishment survey, with >225,000 jobs months highlighted. One can clearly see that while this high jobs growth level used to be the norm in about 2014, now such level of monthly job gains are fewer and far between. Additionally, the kick down to ever-lower disappointment months have recently been treated as transient anomalies, but in fact are becoming the poor new normal that we can expect.

Simply taking the monthly jobs data and showing what the average is, per year, we can see the disconcerting steady slide lower (and it's the identical pattern for the household survey that is shown in gold color). This slippery slide in labor data is too robust for President Trump’s pro-growth policies to completely overcome. And the labor data evidences an economy that is rapidly decelerating to a much slower growth pace. One would also expect market shocks accordingly, as these labor numbers continue to disappoint.

Pages