Our President essentially inhabits twitter. In his 8 years toying on this site, he has averaged nearly 20 tweets daily (a pace that only remains enlarged
since the election)! As contrast, our
Statistical Ideas site has been on twitter only a year, and tweets only twice
daily. But if one chooses to tweet hourly, for
weeks and months and years on end, then there are bound to be some mistakes
(and perhaps egregious blunders) on the social media platform. Being under the magnifying optics of the
Presidency doesn't help, as it leads both the minor and the sometimes foolish errors to quickly become exposed
and exploited. There is no justification
-aware to me- for anyone being on twitter so much, other than to pick childish fights with
other citizens. However it is worth putting any of President Trump’s small number of
spelling or grammar errors within the context of how much more often he tweets
versus most other people. But then, at
the end of last month, his now-deleted and mysterious tweet caught the
social media universe by storm:
Despite the constant
negative press covfefe
It wasn’t so obvious!
Where does one go from
there? The President illuminati was of no
aid, and instead playfully doubled-down this by suggesting we all discover what this word (that only President Trump knows) is all about. And so, ensued a wild chase that quickly led to
three types of guesses. One guess was “coffee”,
since phonetically the word covfefe seems similar to coffee, as absurd of a
word guess as that would be in this context. The
second guess(es) was simply any sort of disturbingly vulgar word that comes to mind. That's what we've come to, in a meme. Proudly and creatively constructed derivatives of “ask your doctor is covfefe is
right for you”, or even more crudely “go covfefe yourself!” And a third popular guess was “coverage”. After all, this seems to conveniently flow with the
narrative that some have always had of the President's "agenda".
Of course, looking at various keyboards, and seeing how one could have
mistyped “coverage” into “covfefe” requires a series of poorly placed digits on the keyboard. But of course, the President rarely uses this
word, and never in connection to “press”.
This is genuinely terrible speculation.
Additionally, we don’t know what the context of the entire tweet is
(what did he type before or want to continue typing after this tweet, since it
was terminated mid-thought). So, enter
Big Data statistical models that were developed to thoroughly investigate the background
of this word “covfefe” and what are the likely different things that was meant
in this now infamously coded tweet.
Auto-complete
The autocomplete feature of
modern operating systems look at a small
amount of information and derive "predictions" as to what one likely meant. Like a sorted market basket analysis, we know
for example that someone who starts to type in “cof” if likely headed to typing
in “coffee” or “coffers”. Covfefe was
never a word before, so auto-complete would never have automatically guessed at
it, and instead would have proposed alternates such as “coffee” or “could”. Now that the President used that word anyway,
an imminent attempt to type in “cov” or “cof” would result in “covfefe”
being suggested as a possibility. These
auto-complete models also incorporate ideas about what might make sense for the
constructed clause and the expected small amount of fat-fingering (which similar to if
covfefe really meant coffee, and then one auto-completes to coffee, would imply
a more robust fat-fingering dataset to work with in the model for a given operating system).
What flows
The use of the word “coverage”
works with the idea that it flows from a bias of what some merely think President Trump meant, while simultaneously anchoring on
the first few letter of “covfefe”. Falsely assuming the mistype is completely at the end of the
word only. This is a common problem with
human guesses, since most do not consider that the mistype could have been anywhere at the
start [the “c” could have easily been a “d” or a (space)]. It should be noted that the President, when
mistyping, generally does so in a predictable way. Sadly, also revealing outstandingly poor spelling skills that make Dan Quayle look like a prodigy. Of course spelling is not everything, and clearly not reflective of his outstanding
marketing and real estate skills.
Word choice history
Another approach to this
problem is to consider the words that President Trump uses and hasn't used. Analyzing this tweets (for the
first time ever), reveals that while the President commonly uses the words “FAKE”
and “media”, but rarely uses the word “press” and “coverage” (and essentially
never in combination). Why should he have abruptly started here? This is part of the mystery.
A couple weeks prior to this incident,
Harvard University released a study exposing negatively-biased media coverage of
Donald Trump during the campaign. So,
the expression “negative media coverage” was used quite a bit in the political and social landscape for the couple weeks
before the “covfefe” tweet, except never by the President himself. And a day before, comedian Kathy Griffin displayed a decapitated image of the President and the word coverage doesn't seem to fit in what anything related to that.
Last, it should be noted that in
Russian and Arabic languages, covfefe is an actual word. While charming for Trump supporters, it would be a strong anomaly for a
president barely versed in the full range of the English language, to be suddenly flaunting foreign language skills.
Statistical matching
A more comprehensive mathematical
approach is to instead of starting with the dictionary and imagining what
President Trump meant, start with the keystrokes and see how much re-arranging
is necessary to create any other word.
Hundreds of billions of possible scrambles of the code!
Limited to 140 characters on twitter, it is often the case that one
diverges from normal grammar, in order to jam a tweet together, and hence the operating
system auto-complete model becomes very difficult to match in an environment where short-hand is often randomly interlaced. Instead we can look for a more robust class of working
options, which takes a lot longer to solve for on the fly - if we were doing this while typing a tweet (but we solved it all here as a one-time mathematical exercise). There are a few different ways one can
mistype a word and they each take different degrees of likely exertion. One can (in general order of commonality):
* mistype a letter either for another near it or similar looking (for example "v" instead of "u")
* or swap latters (one calling a country "Denmakr")
* accidentally not type a letter
* or mistype a letter with one far from it on the keyboard
* or add one letter
* or skip a preceding word to make it fit into twitter
* or finally double space (or other) to force a punctuation.
Further below we show the top combinations from the billions of possibilities of all of these faulty behaviors above. We essentially have reversed what “covfefe” could have been!
* mistype a letter either for another near it or similar looking (for example "v" instead of "u")
* or swap latters (one calling a country "Denmakr")
* accidentally not type a letter
* or mistype a letter with one far from it on the keyboard
* or add one letter
* or skip a preceding word to make it fit into twitter
* or finally double space (or other) to force a punctuation.
Further below we show the top combinations from the billions of possibilities of all of these faulty behaviors above. We essentially have reversed what “covfefe” could have been!
A final model
Combining the probabilistic
rankings of all possible mistypes above, with words that the President
commonly uses, and with what likely could have been constructed by him in this
tweet, we see a rich portrait of potential word choices. They are shown below, in the basic order to
chance.
Bear in mind that it is also highly plausible that we could have a corrected replacement word, but then need to instead change another word in the sentence/tweet altogether that is suddenly grammatically wrong. Here's our top choices of what the President likely was meaning instead of the word "covfefe".
Bear in mind that it is also highly plausible that we could have a corrected replacement word, but then need to instead change another word in the sentence/tweet altogether that is suddenly grammatically wrong. Here's our top choices of what the President likely was meaning instead of the word "covfefe".
Special update June 9: thanks for the many people who looked at various versions of this research (particularly Seth Hannah & Danny Frangiadakis), and thanks to Michael Shedlock for suggested edits to the graphic above (which now have been made.)
No comments:
Post a Comment