In
a world with 7.1 billion inhabitants, the World Health Organization (WHO) recently
estimated that 1.0 billion (14%) people defecate outdoors (OD). This
portion has come down since the early 1990s, a time when OD was nearly 24%.
Health experts generally desire for this portion to come down further, as time
rolls on. And some medical researchers
have associated OD as having an impact on childhood health issues, including
higher mortality rates.
Recent
articles in the New York Times and Economist cite the economics research of Geruso and Spears,
who suggest that religion fully
explains both: (A) differences in OD in India, and (B) consequently differences
in childhood mortality. India is of
particular interest for sanitation issues, given its super population size,
combined with the fact that 60% of OD occurs in India. These aggregate characteristics leave India flatly
in the worst part of the distribution rank, among all countries. For example, when just looking at the “population
portion who OD” (ppod) among nearly 215 countries, India ranks near the top
decile. Where the higher the percentile,
the worse the sanitation measure. And
given India’s 18% share of the global population, when looking at the global
population’s cumulative distribution function from ppod, India takes up much
of the top quartile.
In this web log article we explore ways to think about probability and statistics in different
cases, but where the fundamental dynamics are similar to this one. We leave with the conclusion that we should not
try too hard to aggressively force a conclusion or state a mathematical
relationship, which are beyond what the underlying data and common sense are
able to provide.
The
head of the Hindu American Foundation approached me recently to explore the above noted
economics research. This allowed me to converse
about and think about this subject, particularly from a mathematical
perspective. Some research is on
hot-topic issues, which stir many people’s emotions. This research here is completely independent.
The
Geruso and Spears’ paper links differences in religion to differences in
mortality, via differences in sanitation “choices” that people make. Their work boldly states that religion causes these differences in choices.
Before
analyzing their paper further, let’s look at another example to provide some
context. We move from the Indian
sub-continent, to the African continent.
In particular, we can see that there is a large overlap between the many
Christian-majority countries, and the many countries where there is a larger vegetarian portion of the population. At the same time, we see that there is a large overlap
between the Islamic-majority countries, and the many countries where there is a
larger non-vegetarian portion of the
population.
So
in Africa, does religion cause (or fully
explain) people’s choices over whether or not to eat meat? It would be false to simply leap into that
conclusion. An important theme, which
recurs within probability and statistics, is to always assess what was “left
out of the story?”
The
equator cuts East-West across Africa, searing across countries that happen to
be Christian-majorities. The equator’s
climate then allowed for a cross-continent, agricultural path of vegetation that
doesn’t exist elsewhere in Africa. What’s
also “left out of the story” was that these religious faiths elsewhere in the
world don’t necessarily share the same uniform preference for meat-eating, as
they have in Africa. Also what’s left out of the story is the idea that eating decisions might not be fully
explained by ancient religion, but
rather something possibly more complicated (taking into account multiple
different factors). Finally, what’s left out
of the story is the number of, and population and land mass of, different
countries in Africa.
The lesson here is that whilst it was initially easy to leap into a conclusion concerning religion being
the cause for African eating preferences, a deeper analysis suggests a
different perception of the statistical variables may exist.
Now let’s
return to the paper from Geruso and Spears.
Even if there was nothing “left out of the story” here, we would be
remiss to not note that their Figure 4 looks neat at first blush. But with a more careful statistical eye, we
note that they leave out of those illustrations the sense of how the marker
sizes match to the significance of the minority religion, and why the
difference in OD activity among religions should matter as much as they state. More importantly, it leaves a significant
enough amount of people sitting in the fourth quadrant of their statistical relationship
(i.e., between religion and mortality).
This quadrant represents a departure into the opposite direction from
their research conclusion.
In
looking at the global OD and religion data, it would be easy for a savvy
statistician to solve the matrix algebra, for math sake. And doing so would simply show the obvious
conclusion that the bulk of OD is associated with Hinduism. This trivial result is due to the fact
that India, with a high 18% of the world’s population, has both a majority
share of this religion (80% of Indians are Hindu) and a large share of this
country ppod (48%). We’ll now see
though that this is not a thoughtful analysis, because of the statistical
dynamics this narrow hypothesis leaves out.
Look
at the ppod chart below. We
show half of the six, most populated countries worldwide. These three countries have the highest
population sample size significance, for the rest of our research.
As
we noted in the African meat-eating example, we must take a deeper look instead
of just leaping to the conclusion that religion causes (or is explanatory factor for) ppod. Certainly if one were to convert to Hinduism,
they would not be instructed to grab a pail of water and run into isolation,
off-grid, to practice OD.
So
again, let’s delve deeper into our data. We
pierce the ppod chart on the countries above, except now looking their urban population only. The population sample sizes again are quite large. At 127m for Indonesia, 85m for Nigeria, and
391m for India. And these populations
represent a large portion of each country’ population: 51%, 50%, and 32%,
respectively. See the new ppod
chart below.
Suddenly
the original, quick-hypothesis that religion fully explains OD, is now flipped
up-side down. Now among the three
countries, OD rates are higher on the
two populations charted on the left (i.e., Indonesia and Nigeria). What was missing from the original hypothesis
was the idea that the context plays an
important role in determining sanitation, and this is more than simply one’s religion. Just as context played a large role in Africa,
in suggesting who eats more or less meat.
Geographic
settings around the globe are highly complicated, and they not easy to reduce
to a handful of ancient religions explaining just about any current differences in human trends (in critical analysis far
afield from health). This is a simple lesson
that probability and statistics can teach us.
Let’s
look again at OD, now considering multiple factors at once. We’ll leave religion out of this, since we already
showed a couple times in this web log article that multicolinearity can
confound information. It can provide what
is essentially referenced a Type 3 errors in biostatistics: the probability of being right, but for the wrong reasons. Going forward, we should treat with suspicion
any research headlines that runs off of one-factor explanations.
We
take a look again at ppod by country, where we consider
the difference in the rural population on the horizontal axis, and the urban
population on the vertical axis. Of the
225 countries in the WHO dataset, we spotted some countries such as Eritrea
with miscoded data errors (nearly full ppod in their rural population,
while they claim zero ppod in their urban population). Ultimately 216 nations were examined. And the chart data below are harmonically
scaled to represent the sample size of the urban and rural populations.
Despite
the probability difficulty we previously noted, of reducing a large and complex
country (e.g., India) into just one stable statistical value, we show in green a linear regression confidence interval for one standard
deviation (covering about 70% of the information). This provides a
singular, rough gauge of dispersion about the relationship. It shows one missed silver-lining from a
policy perspective, which is the high probability that as India’s population
becomes much more urbanized, they will likely see a quicker ppod
drop versus most countries. We can see India’s large data, near the right side of the bubble chart above, with an
urban population ppod of 0.12 (12%).
No comments:
Post a Comment