"Mutant Statistics," "Stats Wars," and Other Abuses of Descriptive Statistics

October 13, 2001

Book Review
James Case

Damned Lies and Statistics: Untangling Numbers From the Media, Politicians, and Activitists. By Joel Best, University of California Press, Berkeley, 2001, 196 pages, $19.95.

The latest book by Joel Best, a professor of sociology and criminal justice at the University of Delaware, concerns the nature, origin, and durability of bad social statistics. Some statistics, he says, are born bad; based on nothing more than guesses or dubious data, they aren't much good from the start. Others mutate; they go bad when mangled by fools or knaves. Either way, bad social statistics are potentially important: They can be used to stir up public outrage or fear; they can distort our understanding; and they can lead us to make poor policy decisions.

The title of the book is borrowed from the famous aphorism---variously attributed to Mark Twain, Benjamin Disraeli, and other 19th-century wits---whereby "There are lies, damned lies, and statistics." It reflects the popular suspicion that statistics can't be trusted, and that the people who quote them are manipulative. Everyone knows that "you can prove anything with statistics," and most have been misled at one time or another by statistics that don't mean what they appear to mean. Best goes out of his way to explain that his book is more than an update of How to Lie With Statistics, by Darrel Huff, which he describes as "a useful little book, still in print after more than forty years."

Best uses statistics on a daily basis, and is skilled in the art of inference. Yet his book does not concern the finer points of that mysterious art. On the contrary, it has to do with the "descriptive statistics" typically discussed in the opening chapter or two of business statistics texts as a sort of preliminary to the study of sample spaces, distributions, parameter estimates, and measures of significance. Bad social statistics are typically descriptive statistics.

A statistic that particularly fascinates Best is the oft repeated assertion that 150,000 American women die each year from anorexia. Given the (readily available) facts that anorexia mainly afflicts young women and that about 55,500 American females between the ages of 15 and 44 die each year from all causes, the assertion could hardly be correct. In truth, death certificates attribute only about 70 deaths each year to anorexia!

Neither the feminists who quote the inflated figure, nor the reporters who repeat it, nor the editors who fail to challenge it are versed in the art of screening unfamiliar facts for plausibility. Feminists are anxious to draw attention to an illness that---in their opinion---merits more of society's resources and attention than it receives. Because the statistic in question seems likely to further their cause, they tend to accept it uncritically. Reporters assigned to cover feminists, along with their editors, are inclined to do so too.

To explain such lax behavior, Best quotes Scott Adams, the cartoonist who draws Dilbert: "Reporters are faced with the daily choice of painstakingly researching stories or writing whatever people tell them. Both approaches pay the same."

Another frequently encountered statistic asserts that about 10% of the population is gay or lesbian. Best reports that the most accurate surveys known to him indicate that only 3-6% of males, and a smaller fraction of females, have had significant homosexual experience---usually in adolescence or early adulthood. No more than 3% of adult Americans appear to be practicing homosexuals. It turns out that the 10% figure can be traced to the first Kinsey report (1948), which states that "10 per cent of the males are more or less exclusively homosexual . . . for at least three years between the ages of 16 and 55." But, as Kinsey himself observed, the statement doesn't mean 10% of all males. It means 10% of the males included in his sample, which is not (and was never intended to be) representative of the population as a whole. Kinsey felt a particular need to investigate homosexuality---about which little was known at the time---and went out of his way to interview gays and ex-convicts. Accordingly, his results do not extrapolate in a straightforward manner.

Not surprisingly, gay and lesbian activists often challenge the lower (and apparently more accurate) estimates. They would naturally prefer to believe the 10% figure, which makes them seem a larger and more important minority group than they really are---almost as numerous and important as African Americans. Thus, the 10% figure lives on, and is often used in the calculation of other statistics about gays and lesbians.

It has been reported, for instance, that a third of all teen suicides (or roughly 1500 deaths a year) involve gay or lesbian adolescents. This supports the case that society places unendurable pressure on homosexual adolescents, and stands in need of reform. Best explains how the inflammatory estimate is derived from (1) Kinsey's exaggerated 10% figure, (2) another dubious fraction, and (3) a simple arithmetic error, before pursuing more credible assumptions to the conclusion that only about 6% of teen suicides involve gay or lesbian youths.

Best's candidate for the title of "worst ever" social statistic crossed his desk in a thesis proposal a few years ago. The author asserted that "Every year since 1950, the number of American children gunned down has doubled." The assertion seemed preposterous to Best, and he suspected at first that the author had simply miscopied the sentence. But the cited journal article turned out to contain exactly the same sentence. When contacted, the author of the article cited the Yearbook of the Children's Defense Fund for 1994, in which Best ound the statement "The number of children killed each year by guns has doubled since 1950." This is an example of what Best calls a "mutant statistic." It started out reasonably enough, but became misleading when carelessly (if not fraudulently) misstated. The claim that 10% of the population is homosexual is another example, having grown out of Kinsey's finding that 10% of the people in his sample were homosexual.

To this reviewer, the most interesting of the book's seven chapters were those titled "Mutant Statistics"---several of which have already been discussed---and "Stats Wars." The latter examines conflicts over social statistics, such as the spat that arose in October 1995 concerning the number of participants in the Million Man March. Congress had long ago directed the U.S. Park Police to produce official crowd size estimates whenever significant numbers of people gathered on the mall in Washington. The Park Police have developed a commendably straightforward methodology for constructing such estimates: They determine the area covered by the crowd from overhead photographs, and apply to it a "multiplier" of one person per 3.6 square feet; in this case, they concluded that about 400,000 people were actually present for the occasion.

Louis Farrakhan, leader of the Nation of Islam, reacted furiously, claiming that "racism, white supremacy, and hatred for Louis Farrakhan" had prevented the white establishment from giving credit where credit is due. A team from Boston University then re-examined the Park Police photographs, and applied its own multiplier to arrive at the conclusion that 870,000 people---plus or minus 25%---had in fact joined the march, leaving open at least the possibility that a million men had been present. The Park Police then provided additional photographs and data, which caused the BU team to revise their figures downward to 837,000, plus or minus the same 25%, still leaving intact the million-man estimate.

The crux of the issue was ridiculously simple. What was the average density of people on the mall that day? The Park Police assumption of 3.6 square feet per person already seems conservative, given that (according to studies cited by Best) crowds listening to speakers tend to spread out until they occupy 5.7-8.5 square feet per person. Yet, if either of the Boston University figures is to be believed, there must have been at least one person for every 1.8 square feet---about equal to the crowding in a densely packed elevator. Best himself seems to think that the Park Police figure was a slight overestimate, while the others announced were gross exaggerations.

But all that is beside the main point. What Best finds inexcusable is the failure of the mainstream media, even several days after the event, to make any discernible effort to provide the sort of benchmark data on crowd densities that readers would need to arrive at informed opinions of their own concerning the actual number of participants in the Million Man March.

Best's final chapter suggests that readers take "a critical approach" to social statistics, and identifies a few questions worth asking about any of them. It is always worth knowing how the underlying data were obtained, he writes, and whether sampling was involved. If so, was the sample representative of the population as a whole? What questions were asked of, or concerning, the members of the sample, and how were responses interpreted? It isn't always easy to decide who is or is not homeless, a homosexual, or a suicide, and who has or has not been victimized by fraud, assault and battery, breaking and entering, discrimination, or harassment. Yet, because no such check list can ever be complete, Best finishes by advising a rather flexible form of skepticism, one that is informed by past mistakes. Thus, the real value of his book would seem to reside in the concrete examples he provides from his own considerable experience.

James Case writes from Baltimore, Maryland.

