Kent Brewster has tabulated police ticketing statistics in the affluent city of Atherton, in northern California. The public records show that during a nearly five month period this year, 175 of 182 drivers ticketed had a Hispanic surname. A neighboring charter city is named Redwood City, which is nearly 39% Hispanic. Business Insider and others have hypothesized to what degree racial profiling accounts for the high number of tickets to Hispanic-named drivers, versus other explanations. Such as demographic differences, unfamiliarity with local traffic laws, or simply worse driving based upon racial groups.
This is a natural Bayesian probability problem where we can show the chance of this 175-ticket statistic occurring through luck alone. See the set-up below from the secondary data source.
By separating out the tickets given to residents of Redwood City versus non-Redwood City, and accounting for the Hispanic population make-up from both cities, we can see that a race-blind distribution of 182 tickets would have only 71 going to Hispanic drivers. To make the situation more frustrating, the bulk of the 175 tickets could not have been generated mostly by Redwood City drivers, as only 53% of the tickets were to those residents. Therefore a large portion of tickets to Hispanic-named drivers had gone to other residents more familiar with local traffic laws.
A binomial distribution variance model can look at the variance count on both independent city-residents ticketed [np(1-p)], and this comes to 43 tickets total from the two cities. Or a standard deviation of 7.
The probability of seeing 175 or more tickets, from a much lower expected average, is very low. The corollary of this is that we would have a high confidence that we have a bias towards a higher ticketing rate, for drivers who have Hispanic surnames versus those who do not.
This is a natural Bayesian probability problem where we can show the chance of this 175-ticket statistic occurring through luck alone. See the set-up below from the secondary data source.
By separating out the tickets given to residents of Redwood City versus non-Redwood City, and accounting for the Hispanic population make-up from both cities, we can see that a race-blind distribution of 182 tickets would have only 71 going to Hispanic drivers. To make the situation more frustrating, the bulk of the 175 tickets could not have been generated mostly by Redwood City drivers, as only 53% of the tickets were to those residents. Therefore a large portion of tickets to Hispanic-named drivers had gone to other residents more familiar with local traffic laws.
A binomial distribution variance model can look at the variance count on both independent city-residents ticketed [np(1-p)], and this comes to 43 tickets total from the two cities. Or a standard deviation of 7.
The probability of seeing 175 or more tickets, from a much lower expected average, is very low. The corollary of this is that we would have a high confidence that we have a bias towards a higher ticketing rate, for drivers who have Hispanic surnames versus those who do not.
It seems to me that Hispanic drivers need to slow down, and quit breaking the traffic laws of the state.
ReplyDeleteThanks much Anonymous. From a statistical perspective, additional analysis could only be performed if we had additional data. These analyses would include causality, as well as checks for multi-colinearity with another variable yet to be discovered. For more on multi-colinearity, see the "Correlated trivariate distributions, and outliers" note, at http://statisticalideas.blogspot.com/2013/06/correlated-trivariate-normal.html.
Delete