Pages

Friday, June 9, 2017

constant negative press covfefe

Our President essentially inhabits twitter.  In his 8 years toying on this site, he has averaged nearly 20 tweets daily (a pace that only remains enlarged since the election)!  As contrast, our Statistical Ideas site has been on twitter only a year, and tweets only twice daily.  But if one chooses to tweet hourly, for weeks and months and years on end, then there are bound to be some mistakes (and perhaps egregious blunders) on the social media platform.  Being under the magnifying optics of the Presidency doesn't help, as it leads both the minor and the sometimes foolish errors to quickly become exposed and exploited.  There is no justification -aware to me- for anyone being on twitter so much, other than to pick childish fights with other citizens.  However it is worth putting any of President Trump’s small number of spelling or grammar errors within the context of how much more often he tweets versus most other people.  But then, at the end of last month, his now-deleted and mysterious tweet caught the social media universe by storm:
Despite the constant negative press covfefe

It wasn’t so obvious!
Where does one go from there?  The President illuminati was of no aid, and instead playfully doubled-down this by suggesting we all discover what this word (that only President Trump knows) is all about.  And so, ensued a wild chase that quickly led to three types of guesses.  One guess was “coffee”, since phonetically the word covfefe seems similar to coffee, as absurd of a word guess as that would be in this context.  The second guess(es) was simply any sort of disturbingly vulgar word that comes to mind.  That's what we've come to, in a meme.  Proudly and creatively constructed derivatives of “ask your doctor is covfefe is right for you”, or even more crudely “go covfefe yourself!”  And a third popular guess was “coverage”.  After all, this seems to conveniently flow with the narrative that some have always had of the President's "agenda".  Of course, looking at various keyboards, and seeing how one could have mistyped “coverage” into “covfefe” requires a series of poorly placed digits on the keyboard.  But of course, the President rarely uses this word, and never in connection to “press”.  This is genuinely terrible speculation.  Additionally, we don’t know what the context of the entire tweet is (what did he type before or want to continue typing after this tweet, since it was terminated mid-thought).  So, enter Big Data statistical models that were developed to thoroughly investigate the background of this word “covfefe” and what are the likely different things that was meant in this now infamously coded tweet.

Auto-complete
The autocomplete feature of modern operating systems look at a small amount of information and derive "predictions" as to what one likely meant.  Like a sorted market basket analysis, we know for example that someone who starts to type in “cof” if likely headed to typing in “coffee” or “coffers”.  Covfefe was never a word before, so auto-complete would never have automatically guessed at it, and instead would have proposed alternates such as “coffee” or “could”.  Now that the President used that word anyway, an imminent attempt to type in “cov” or “cof” would result in “covfefe” being suggested as a possibility.  These auto-complete models also incorporate ideas about what might make sense for the constructed clause and the expected small amount of fat-fingering (which similar to if covfefe really meant coffee, and then one auto-completes to coffee, would imply a more robust fat-fingering dataset to work with in the model for a given operating system).

What flows
The use of the word “coverage” works with the idea that it flows from a bias of what some merely think President Trump meant, while simultaneously anchoring on the first few letter of “covfefe”.  Falsely assuming the mistype is completely at the end of the word only.  This is a common problem with human guesses, since most do not consider that the mistype could have been anywhere at the start [the “c” could have easily been a “d” or a (space)].  It should be noted that the President, when mistyping, generally does so in a predictable way.  Sadly, also revealing outstandingly poor spelling skills that make Dan Quayle look like a prodigy.  Of course spelling is not everything, and clearly not reflective of his outstanding marketing and real estate skills.

Word choice history
Another approach to this problem is to consider the words that President Trump uses and hasn't used.  Analyzing this tweets (for the first time ever), reveals that while the President commonly uses the words “FAKE” and “media”, but rarely uses the word “press” and “coverage” (and essentially never in combination).  Why should he have abruptly started here?  This is part of the mystery.  
A couple weeks prior to this incident, Harvard University released a study exposing negatively-biased media coverage of Donald Trump during the campaign.  So, the expression “negative media coverage” was used quite a bit in the political and social landscape for the couple weeks before the “covfefe” tweet, except never by the President himself.  And a day before, comedian Kathy Griffin displayed a decapitated image of the President and the word coverage doesn't seem to fit in what anything related to that. 
Last, it should be noted that in Russian and Arabic languages, covfefe is an actual word.  While charming for Trump supporters, it would be a strong anomaly for a president barely versed in the full range of the English language, to be suddenly flaunting foreign language skills.

Statistical matching
A more comprehensive mathematical approach is to instead of starting with the dictionary and imagining what President Trump meant, start with the keystrokes and see how much re-arranging is necessary to create any other word.  Hundreds of billions of possible scrambles of the code! 
Limited to 140 characters on twitter, it is often the case that one diverges from normal grammar, in order to jam a tweet together, and hence the operating system auto-complete model becomes very difficult to match in an environment where short-hand is often randomly interlaced.  Instead we can look for a more robust class of working options, which takes a lot longer to solve for on the fly - if we were doing this while typing a tweet (but we solved it all here as a one-time mathematical exercise).  There are a few different ways one can mistype a word and they each take different degrees of likely exertion.  One can (in general order of commonality):
 * mistype a letter either for another near it or similar looking (for example "v" instead of "u")
 * or swap latters (one calling a country "Denmakr")
 * accidentally not type a letter 
 
 * or mistype a letter with one far from it on the keyboard
 * or add one letter
 * or skip a preceding word to make it fit into twitter
 * or finally double space (or other) to force a punctuation.

Further below we show the top combinations from the billions of possibilities of all of these faulty behaviors above.  We essentially have reversed what “covfefe” could have been!

A final model
Combining the probabilistic rankings of all possible mistypes above, with words that the President commonly uses, and with what likely could have been constructed by him in this tweet, we see a rich portrait of potential word choices.  They are shown below, in the basic order to chance.

Bear in mind that it is also highly plausible that we could have a corrected replacement word, but then need to instead change another word in the sentence/tweet altogether that is suddenly grammatically wrong.  Here's our top choices of what the President likely was meaning instead of the word "covfefe".


Special update June 9: thanks for the many people who looked at various versions of this research (particularly Seth Hannah & Danny Frangiadakis), and thanks to Michael Shedlock for suggested edits to the graphic above (which now have been made.)

No comments:

Post a Comment