Statistical Ideas: Attack on eastern Damascus

So far we believe that there were 12 different locations attacked, on August 21, in the suburbs of Damascus. Of these 12 locations, 2 (in red below) were far southwest of the Presidential palace (in green below). And 10 locations (in blue below, along with labels for those statistically significant) were east of this residence. With this type of outcome, one thing we can safely say is that President Bashar al-Assad’s forces weren’t targeting the Presidential palace.

Also given the large difference in the 2 southwest locations versus the 10 eastern locations, it is also clear that these are statistically separate directions. Both of the southwest locations were in contested regions, meaning that there is current fighting there between the regime and the rebels. But who were the targets in the 10 eastern locations? And are there any statistical techniques we can learn to better understand these 10 locations impacted? Based on current public information, these are the questions we answer in this analysis below, and we see that as is typical with statistical analysis, the results are not as cut and dry as we might prefer.

This analysis comes with a large emotional backdrop, as the global community weighs whether an international military response should be warranted. And it also comes off the heals of a rising death toll from an initially inspiring Arab Spring, which has slowly morphed into something more concerning. One lesson that we have learned from the major wars of the past decade is that we should always take these moments to learn as much as we can about the intelligence shared with us. It is prudent for the global public now to explore even seemingly basic data, in different ways.

It should be noted that the actual number and coordinates of these 10 eastern locations may change over time, as the initial intelligence was from a chaotic collection of disparate evidence, both in terms of witness and response data cobbled together in the night of the attack, mixed together with the approximate physical damage locations provided from various sources.

The geographic center of the 10 eastern locations is in one of the many unattributed areas amalgamated through Damascus’ suburbs. "Unattributed" means that the location is neither controlled, nor contested. There is some probability that the actual target was this unattributed center, and the actual attack locations by some human flaw or random error strayed into the geographic pattern shown on the chart above. Remember that there are multiple error sources at play as well, between the execution of the attack, and the location description from the witnesses.

The smallest circle (completely on an aside this also be used to approximate the maximum likelihood estimator of certain bivariate beta distributions), which could be drawn around all 10 eastern locations, would have a diameter of 7.5 miles, or an area less than pi*(7.5/2)² miles². Or 44 miles². The issue again is that much of this circle would not be rebel-controlled, since their locations are still heterogeneously dispersed.

Instead of support vector machines we use the k-means technique to understand how to further partition these 10 eastern locations into smaller clusters and analyze those cohort attributes.

A single partition would section off the eastern 10 cities into northern and southern clusters of locations. See the illustration below for a basic idea. Still, these two cohort clusters still have centers that are in unattributed regions. Our statistical fit ratio is the dispersion “between the clusters' centers” to the dispersion “among locations within each cluster”. This maximum ratio for a single partition is eight. Now any two combinations from the 10 locations were eligible to be sectioned off through the partition, but the general northern and southern cohorts gave the highest statistical ratio. For those curious, just sectioning off Duma would have been a ratio of six.

Now two partitions would section off Duma inside of a northern cohort, Mulayha inside of a southeastern cohort, and finally Jawbar and Ayn Tarma inside a western cohort. And the statistical ratio here would rise even further, from the eight we just saw in the single partition case. In addition, we would have somewhat of a tilt towards the contested attribution for the attack’s target. We are still not seeing clusters that are universally recognized to be centered about a currently, rebel-controlled location.

With three partitions we have a similar result as with the two partition case, but for partitions that are further broken apart, with an opening of an eastern-cohort cluster that includes Jisrayn (together with Siqba less than a mile north of it). Here the statistical fit ratio would continue to increase, this time much greater than 12. And the target attribution in this case would be about 70% unattributed, 20% contested, and 10% rebel-controlled.

The k-means analysis of the statistical ratios, rises from a 0 fit for no partitions, trending towards infinity for the unlikely event that each of the 10 locations was honed in on as its own cluster. And along the way we (slowly) have a better understanding as to the range of target attribution associated with Syria’s chemical attack in these eastern locations. We can make part of a case with the large number of smaller cluster sizes that the 10 eastern locations were intended to be 70% rebel-controlled areas, with the remaining 30% split between contested and unattributed regions. But this path is well-riddled with unattributed locations as attack targets, as opposed to rebel-controlled locations, when seen through various lenses of smaller statistical partitions of the mapping data against the spotty location attributes for what areas are rebel-controlled. The statistical analysis would only be enhanced with: (a) greater supplementary quantitative data, which the public so far lacks, or (b) if hypothetically the rebel-location was generally continuous and more clustered away from the regime’s control.

Ultimately we are forced to appreciate in this analysis that any discussion of these maps is as difficult as the probability theory of trying to identify the underlying distribution when it is discontinuous and very dispersed (e.g., just look at a few of the various rebel-controlled maps in the past week). And if just trying to understand the attribution of a region is not difficult enough, we add the mixture of trying to understand the actual target’s attribution versus these non-regime controlled pockets. We showed here that while there is a high likelihood that the chemical attack targets were rebel-controlled, the spatial mapping results distributed through the popular press do not prove this case very cleanly.

1 comment:

UnknownSeptember 29, 2013 at 11:42 AM
Thanks much to a reader, who offline has written to me concerning the recent UN report here: http://s3.documentcloud.org/documents/787427/u-n-syria-chemical-report.pdf. It should be noted that since my blog note, Secretary Kerry's comments have started to hedge into my direction of probable events, as opposed to the original certain conclusions. So also, therein lies some of the statistical difficulty of proving a case from initial data, prior to the results of a final investigation.

In any event, my book draft develops the ideas of the probabilistic modeling challenges we face, when the underlying population is as amorphous and as rapidly changing, as some of the Arab Spring's war-torn nations.

Statistical Ideas

Pages

Sunday, September 8, 2013

Attack on eastern Damascus

1 comment: