Statistical Ideas: Defrauding, invincible facebook

Social-media goliath facebook has been caught cheating, yet again. Over the years we've shown in mathematical detail how they have habitually over-exaggerated their operating statistics (when they are not busy censoring topics, treating us like a psychological laboratory, skirting privacy settings, tricking the IRS, and other experimental data shenanigans). Their previous overestimates have been on everything from their actual number of users, to their size of their users' networks, both inflated by about 10%. And this translates into a dishonest competitive advantage, and an unethical business model, not a one-off accidental error. Today, close observers of their less-seen, advertising help center, saw an implied disclosure that they have been massively over-inflating by 70% their video ad engagement. This has been going on for dozens of months now. And this particular sleight of hand on their most faithful origin, as we show below, can be performed by silently discarding the worst ½ of their advertising viewing information, and then presenting a much rosier prospectus to court ad dollars away from rival media. They essentially claim to have a 1.000 batting average, after first excluding all of the strike-outs from the data. Or telling a spouse you earn $700k a year, instead of $400k, by simply looking at your highest commission days only and then annualizing that. Facebook's actual ad viewing is about 4 seconds, and they have sponged up tens of billions of dollars by instead deceptively claiming their typical ads were being viewed for all of 7 seconds. Today investors punished their stock price 2% and we see that their gross math tricks are absurd, even among their competitors.

Ad viewing times can follow an exponential probability distribution, and it gets a little more exotic when removing an ad distribution function that is capped at 3 seconds. Let's see the conditional formulae build up below (or for those less interested in theory, one can skip over the proof in purple italics and continue further below):

X = ad viewing distribution

X~exponential distribution (λ)

X = X^d + X-d₊
Risk variable = Loss limit + Insurance with deductible (in this case d = 3 seconds)

E(X) = E(X^d) + E(X-d₊)

= 1/λ

Instead of showing E(X), facebook showed d+E(X-d₊)/S(d), which was 70% larger. So,

E(X)*1.7               = d + e[f(X|d)]

E(X)                        = E(X^d) + S(d)*e[f(X|d)]
E(X)                        = E(X^d) + [1-F(d)]*[E(X)*1.7-d]
1/λ                         = (1-e^-3λ)/λ + [e^-3λ]*(1.7/λ - 3)

Or λ~1/4

What's left after this is similar to an insurance risk model for ad viewing, with a 3 second deductible. We solved for the parameters above, and now [E(X) solved through the result of integration by parts] we can see what the expected value of facebook's ad viewing distribution (X) really is, versus what they were showing (X'=d+X). This will provide us some context to the problem, and also we are going to solve through probability calculus something else that is critical to know. We should know, for ethical purposes, what portion of facebook ads did Mark Zuckerberg's firm deceptively exclude from their business partners (i.e., see metric F(d) below):

E(X) ~ 1/λ

~ 4 seconds

E(X')                       = E(X), from d₊
                                = d + E(X)
                                ~ 3 + 1/λ

~ 7 seconds

F(d) ~ S λ*e^-xλ(from 0, to d)

~ S 0.25*e^-0.25x (from 0, to 3 seconds)

~ -e^-0.25x (from 0, to 3 seconds)

~ -e^-0.25*3- -e^-0.25*0

~ -e^-0.25*3+ 1

= 52%

The numbers above are not insignificant in the slightest, and we'll show the sensitivity of results if we made some changes in the settings of this mischievous calibration. But first, let's enjoy a graphical representation of hundreds of simulated results of the original ad distribution (X), which is in blue on the chart below. Now we could have the segment of this that facebook chose to reveal, which was from 3 seconds and higher. Showing marketers only the insidiously selected portion (X') in yellow (which averages to green during the color overlay).

In the table below, we can explore the sensitivity that facebook's results would have if they picked other timing deductions as a bogus cut-off in their ad viewing. We also see below the proportional over-estimation they could fictitiously claim and how much of the data would have to be hidden in order to meet that self-selected deductible number of seconds. They clearly decided on the highlighted row below, while the first data row below was the fair truth.

Initial d seconds excluded	Expected ad viewing time	Artificial facebook over-estimation	Portion of ads truncated (worst ones concealed)
none	4	none	none
1	4.5	10%	10%
3	7	70%	52%
5	8	94%	70%
10	13	220%	93%

Statistical Ideas

Pages

Friday, September 23, 2016

Defrauding, invincible facebook

2 comments: