Wednesday, May 28, 2025

"Lies, Damned Lies and Statistics"

What is a “fact,” and how do we know? 


Consider any number of statistical correlations we might care to investigate: whether crime, mental health (changing diagnostic criteria alter prevalence rates); poverty (different poverty line calculations yield dramatically different numbers); education (standardized test focus narrows what's measured as "learning," but some relatively objective means has to be used); public health (disease surveillance systems prioritize certain conditions over others). 


The statistics we collect about crime and human behavior are powerfully shaped by the decisions about what to count, how to count it, and what to prioritize. Whether one believes that is a reflection of societal power or something more simple, our choices about what to count influences both the “numbers” and the sense of significance. 


To use an obvious example, to the extent we decriminalize or legalize use of marijuana, the amount of crime related to “illegal” use goes away. Then there are issues related to which crimes we choose to prioritize over others which also are legally crimes. Law enforcement agencies, for example, have finite resources. They might choose to ignore some infractions to focus on others. That directly shapes crime statistics (enforcement increases volume; ignoring decreases volume of reported instances). 


There also is a difference between unreported and reported; prosecuted and not prosecuted; acquittal and conviction rates. 


Also, changes in recording practices can create statistical variances. Redefining deviance upwards or downwards (what is a crime; what is not) will affect the statistics. 


During the Covid-19 pandemic, there were complexities in how deaths were classified when COVID-19 was detected alongside other health conditions. 


In most jurisdictions, including the United States, the standard practice followed CDC guidance: deaths were counted as COVID-19 deaths if COVID-19 was listed as a cause of death on the death certificate, either as the underlying cause or as a contributing factor. 


This approach meant that someone who died with multiple conditions could be counted in COVID-19 mortality statistics if COVID-19 played a role in the death. But there were at least three distinct categories:

  • Deaths directly caused by COVID-19 (e.g., respiratory failure due to COVID-19 pneumonia)

  • Deaths where COVID-19 was a contributing factor that exacerbated existing conditions

  • Deaths where someone tested positive for COVID-19 but died primarily from unrelated causes


The controversy centered on the inclusion of category 2 and sometimes category 3 cases. 


The CDC eventually distinguished between deaths "from" COVID-19 and deaths "with" COVID-19, though public reporting didn't always clearly separate these categories.


These classification decisions had significant implications for our understanding of the pandemic's impact and highlighted how methodological choices in mortality statistics can shape our perception of public health crises. 


The point is that there are “statistics” and there are “lies, damned lies, and statistics.” In other words, seemingly objective statistics are only partly thus.


No comments:

When Does AI Not Add Much Value for Consumer Hardware?

As useful as artificial intelligence is for software, that might not mean it is equally compelling for many types of consumer hardware.  I a...