Statistical Ideas: Religion clashed with health

In a world with 7.1 billion inhabitants, the World Health Organization (WHO) recently estimated that 1.0 billion (14%) people defecate outdoors (OD). This portion has come down since the early 1990s, a time when OD was nearly 24%. Health experts generally desire for this portion to come down further, as time rolls on. And some medical researchers have associated OD as having an impact on childhood health issues, including higher mortality rates.

Recent articles in the New York Times and Economist cite the economics research of Geruso and Spears, who suggest that religion fully explains both: (A) differences in OD in India, and (B) consequently differences in childhood mortality. India is of particular interest for sanitation issues, given its super population size, combined with the fact that 60% of OD occurs in India. These aggregate characteristics leave India flatly in the worst part of the distribution rank, among all countries. For example, when just looking at the “population portion who OD” (ppod) among nearly 215 countries, India ranks near the top decile. Where the higher the percentile, the worse the sanitation measure. And given India’s 18% share of the global population, when looking at the global population’s cumulative distribution function from ppod, India takes up much of the top quartile.

In this web log article we explore ways to think about probability and statistics in different cases, but where the fundamental dynamics are similar to this one. We leave with the conclusion that we should not try too hard to aggressively force a conclusion or state a mathematical relationship, which are beyond what the underlying data and common sense are able to provide.

The head of the Hindu American Foundation approached me recently to explore the above noted economics research. This allowed me to converse about and think about this subject, particularly from a mathematical perspective. Some research is on hot-topic issues, which stir many people’s emotions. This research here is completely independent.

The Geruso and Spears’ paper links differences in religion to differences in mortality, via differences in sanitation “choices” that people make. Their work boldly states that religion causes these differences in choices.

Before analyzing their paper further, let’s look at another example to provide some context. We move from the Indian sub-continent, to the African continent. In particular, we can see that there is a large overlap between the many Christian-majority countries, and the many countries where there is a larger vegetarian portion of the population. At the same time, we see that there is a large overlap between the Islamic-majority countries, and the many countries where there is a larger non-vegetarian portion of the population.

So in Africa, does religion cause (or fully explain) people’s choices over whether or not to eat meat? It would be false to simply leap into that conclusion. An important theme, which recurs within probability and statistics, is to always assess what was “left out of the story?”

The equator cuts East-West across Africa, searing across countries that happen to be Christian-majorities. The equator’s climate then allowed for a cross-continent, agricultural path of vegetation that doesn’t exist elsewhere in Africa. What’s also “left out of the story” was that these religious faiths elsewhere in the world don’t necessarily share the same uniform preference for meat-eating, as they have in Africa. Also what’s left out of the story is the idea that eating decisions might not be fully explained by ancient religion, but rather something possibly more complicated (taking into account multiple different factors). Finally, what’s left out of the story is the number of, and population and land mass of, different countries in Africa.

The lesson here is that whilst it was initially easy to leap into a conclusion concerning religion being the cause for African eating preferences, a deeper analysis suggests a different perception of the statistical variables may exist.

Now let’s return to the paper from Geruso and Spears. Even if there was nothing “left out of the story” here, we would be remiss to not note that their Figure 4 looks neat at first blush. But with a more careful statistical eye, we note that they leave out of those illustrations the sense of how the marker sizes match to the significance of the minority religion, and why the difference in OD activity among religions should matter as much as they state. More importantly, it leaves a significant enough amount of people sitting in the fourth quadrant of their statistical relationship (i.e., between religion and mortality). This quadrant represents a departure into the opposite direction from their research conclusion.

In looking at the global OD and religion data, it would be easy for a savvy statistician to solve the matrix algebra, for math sake. And doing so would simply show the obvious conclusion that the bulk of OD is associated with Hinduism. This trivial result is due to the fact that India, with a high 18% of the world’s population, has both a majority share of this religion (80% of Indians are Hindu) and a large share of this country ppod (48%). We’ll now see though that this is not a thoughtful analysis, because of the statistical dynamics this narrow hypothesis leaves out.

Look at the ppod chart below. We show half of the six, most populated countries worldwide. These three countries have the highest population sample size significance, for the rest of our research.

As we noted in the African meat-eating example, we must take a deeper look instead of just leaping to the conclusion that religion causes (or is explanatory factor for) ppod. Certainly if one were to convert to Hinduism, they would not be instructed to grab a pail of water and run into isolation, off-grid, to practice OD.

So again, let’s delve deeper into our data. We pierce the ppod chart on the countries above, except now looking their urban population only. The population sample sizes again are quite large. At 127m for Indonesia, 85m for Nigeria, and 391m for India. And these populations represent a large portion of each country’ population: 51%, 50%, and 32%, respectively. See the new ppod chart below.

Suddenly the original, quick-hypothesis that religion fully explains OD, is now flipped up-side down. Now among the three countries, OD rates are higher on the two populations charted on the left (i.e., Indonesia and Nigeria). What was missing from the original hypothesis was the idea that the context plays an important role in determining sanitation, and this is more than simply one’s religion. Just as context played a large role in Africa, in suggesting who eats more or less meat.

Geographic settings around the globe are highly complicated, and they not easy to reduce to a handful of ancient religions explaining just about any current differences in human trends (in critical analysis far afield from health). This is a simple lesson that probability and statistics can teach us.

Let’s look again at OD, now considering multiple factors at once. We’ll leave religion out of this, since we already showed a couple times in this web log article that multicolinearity can confound information. It can provide what is essentially referenced a Type 3 errors in biostatistics: the probability of being right, but for the wrong reasons. Going forward, we should treat with suspicion any research headlines that runs off of one-factor explanations.

We take a look again at ppod by country, where we consider the difference in the rural population on the horizontal axis, and the urban population on the vertical axis. Of the 225 countries in the WHO dataset, we spotted some countries such as Eritrea with miscoded data errors (nearly full ppod in their rural population, while they claim zero ppod in their urban population). Ultimately 216 nations were examined. And the chart data below are harmonically scaled to represent the sample size of the urban and rural populations.

Despite the probability difficulty we previously noted, of reducing a large and complex country (e.g., India) into just one stable statistical value, we show in green a linear regression confidence interval for one standard deviation (covering about 70% of the information). This provides a singular, rough gauge of dispersion about the relationship. It shows one missed silver-lining from a policy perspective, which is the high probability that as India’s population becomes much more urbanized, they will likely see a quicker ppod drop versus most countries. We can see India’s large data, near the right side of the bubble chart above, with an urban population ppod of 0.12 (12%).

Statistical Ideas

Pages

Tuesday, September 16, 2014

Religion clashed with health

No comments:

Post a Comment