Don’t be seduced by the numbers and charts measuring the coronavirus — they are all wrong (including mine)
By Kent R. Kroeger (April 12, 2020)
“The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases.”
— Josiah Stamp (1880–1941), an English industrialist, economist, civil servant, statistician, writer, and banker
Barely a month into the coronavirus pandemic, the congressional Democrats and Republicans are already viewing this human tragedy through their narrow, partisan lenses.
And it was only two weeks ago they were working so well together, when the Congress passed and President Trump signed the $2 trillion bailout bill — U.S. history’s greatest wealth transfer from the U.S. Treasury to wealthy Americans.
That is what Democrats and Republicans can agree on when the American people are preoccupied with whether to go to the grocery store to re-stock their water and food supply, thereby risking coming down with the scariest virus since the fictional Andromeda Strain.
Now that both political parties have guaranteed that wealth inequality is baked into the American economic system for at least another generation, they can get back to pretending they can’t agree on anything.
And what is the current partisan disagreement de jour? Whether the Americans should go back to business-as-usual, or continue to shelter-in-place until every last known RNA segment of the coronavirus has been wiped from the face of the earth — or, at least, the blue states.
While the bailout bill guarantees the Top 1% will survive the pandemic with their bank accounts intact (plus a hefty Christmas bonus), a significant percentage of Americans are genuinely at risk of going financially bankrupt if this pandemic lockdown extends beyond summer. Even Trump understands those people represent voters who could potentially tip the balance against him in November. The Democrats, likewise, led by Alexandria Ocasio-Cortez and the Cuomo brothers, Governor I’d-Better Cover-My-Backside and Fredo, see a tremendous opportunity to extend the Blue Wave started in the 2018 midterm elections.
The Republicans, for good reason, want the U.S. economy back to normal as quickly as possible. Every day this near-national lockdown extends, the more Republicans, independents, and disaffected Democrats who are adversely affected and potentially more likely to vote against Trump.
On the other side, the Democrats are saying if end this lockdown too soon we risk a second wave of the coronavirus. And what is too soon, according to the Democrats? Anything that ends this economic catastrophe before November is my guess.
Most disturbing about this morbid ballet dance now going on in Washington, D.C. and in the national media is how both sides cherry pick the coronavirus numbers that best support their policy position. There is nothing new about that. Politicians do that on every issue.
But it is particularly damaging in the case of coronavirus pandemic because it indeed puts lives and livelihoods at additional risk. Furthermore, it perpetuates the myth that the statistical numbers governments and independent agencies are producing for decision making about the coronavirus are accurate…when, in fact, they are not.
That is not an assertion on my part. It is a well-documented characteristic of the statistics gathered during previous epidemics and pandemics. The numbers initially collected and tabulated are invariably under-counts of the true spread and mortality rate of a virus.
The 2009 swine flu pandemic is a perfect example. In August 2010, over a year after the pandemic started, the World Health Organization (WHO) had confirmed 1,632,710 cases of the swine flu worldwide and 18,449 resulting deaths. The WHO knew it was an under-count and reported the numbers explicitly with that caveat.
It would not be until August 2011 and June 2012 that the WHO and U.S. Centers for Disease Control (CDC) would release their modeled estimates of the 2009 swine flu’s actual reach worldwide: 700 million to 1.4 billion people infected and between 151,700 and 575,400 dead.
And those modeled estimates too are subject to error and dispute.
So why is the news media telling us the U.S. has had six times more confirmed cases of COVID-19 than China (530,200 to 83,134 at last count), when they know (and we know, and even China knows!) that isn’t close to true.
Some estimates put China’s COVID-19 cases and deaths at up to 10 times the government-sourced numbers, including my own estimate. And it is not just China’s numbers being questioned. The Italian government’s own civil protection authority has estimated Italy’s true case numbers are 10 times higher than officially reported, and for many of the same reasons used to dispute China’s numbers, including: (1) lack of testing kits, (2) unmeasured asymptomatic cases, (3) symptomatic cases who never sought medical help, (4) misclassification by health officials (e.g., failing to properly categorize COVID-19-related pneumonia cases, (5) government changing definitions for COVID-19 classification, and (6) simple human error.
Yet, there is no reason to pick on China and Italy. The U.S. numbers most likely are equally wrong — and for the same reasons.
Of course, Fox News and the conservative blogosphere have found reasons to believe the U.S. is over-counting coronavirus cases and deaths, noting an apparent drop this year in CDC reported pneumonia-related deaths, compared to previous years.
“For the last few weeks, that [pneumonia] number has come in far lower than at the same moment in previous years. How could that be?” Fox News host Tucker Carlson asked on his show recently.
The Washington Post’s Aaron Black rightly noted that the CDC pneumonia-related death numbers were not down by much and historically those CDC-reported numbers tend to be under-counts (that word again!) when the numbers are first reported. The CDC will produce more accurate numbers later in the year (and take their word for it, those numbers will be right!).
While Trump’s own director of the National Institute of Allergy and Infectious Diseases, Dr. Anthony Fauci, also knocked down that Fox News theory, it is a reasonable assumption that some pneumonia-related deaths have been classified as coronavirus-related despite no lab-test confirmation (recall Reason #1 above for why national coronavirus counts are likely wrong).
The question shouldn’t be if this has happened, but, how frequently this has happened. And asking that question is not tacit support for Fox News’ conspiracy theory, it is merely an acknowledgement that these coronavirus statistics contain error.
In my introductory statistics classes I always shared with students what data scientists call The Confusion Matrix:
It shows the potential for error in any measurement process. In this example, the matrix is updated to apply to the COVID-19 pandemic.
If COVID-19 existed in a perfect world, every government would accurately measure every citizen for the coronavirus on a regular basis. Everyone would either fall into the True Positive category or the True Negative category.
That is not happening…not in the U.S….not in China…not in Peru…not in Germany…not even in South Korea (the apparent gold standard on how to measure and contain the coronavirus).
The vast majority of world citizens have not been tested for the coronavirus and probably will never be tested. According to OurWorldInData.org, 8.3 million tests (as of 8 April) have been reported by government authorities worldwide. But we already know that number is low because some countries — particularly outside Europe, North America and East Asia — have not reported their testing numbers (including China).
But even if we double the testing number, only about 0.2 percent of world citizens have been tested for the coronavirus — and that number is squishy because countries don’t always distinguish between the number of tests they’ve administered (some people require more than one test) versus the number of people tested.
Keeping in mind The Confusion Matrix, Chinese scientists report preliminary evidence that the most common type of COVID-19 test — the reverse transcriptase polymerase chain reaction (RT-PCR) test — may give false-negative results around 30 percent of the time.
Dr. Harlan M. Krumholz, a professor of medicine at Yale University and director of the Yale New Haven Hospital Center for Outcomes Research and Evaluation thinks the U.S. false-negative rate is even higher.
”Unfortunately, we have very little public data on the false-negative rate for these tests in clinical practice,” says Dr. Krumholz. And there are false-positive tests cases as well — though our better-safe-than-sorry bias makes us less concerned about those cases.
In sum, of the six disposition cells in The Confusion Matrix, four work against the accuracy of any number being generated about the coronavirus. We can minimize how many people fall into these cells, but can’t empty these cells entirely.
This is why modeled estimates are so critical to understanding the coronavirus and its associated disease, COVID-19. They explicitly acknowledge classification error and try to compensate for it. The models aren’t perfect, God knows can I attest to that, but they are typically better than the raw data when it is initially collected.
Which is why the political debate right now about when the U.S. can return to normal (which I don’t believe will happen until there is a vaccine) is so unproductive; and, indeed, does harm to the general public’s understanding of the coronavirus.
The Trump administration, congressional Republicans, and their loyal media outlets, are starting to minimize the severity of the coronavirus and risk re-igniting its spread if they follow through on plans to return the U.S. to near business-as-usual too soon (in May if Trump gets his way).
Likewise, congressional Democrats have cherry-picked worst-case scenario forecasts (which are fast being proven wrong, particularly predictions about the U.S. death toll) that do more to create panic in the general public than build their knowledge about the coronavirus.
Underwriting the fact-denial and fear-mongering are the coronavirus numbers which offer evidence for both sides. But the reality is that we still don’t know how far the coronavirus has spread within the population (morbidity rate), its contagiousness (R0 — “R naught”), or its mortality rate [with estimates ranging from the “common flu” (0.1%) to SARS-like numbers (10%) in some locations].
As long as we don’t have a better handle on these numbers, our political debates are going to be disproportionately driven by the hourly updates on Johns Hopkins University’s COVID-19 Dashboard website. That is a bad thing because these numbers are not accurate.
[According the COVID-19 Dashboard, the number of U.S. coronavirus cases just rose from 530,000 to 542,000 while I am writing this article.]
Economic research blogger, Christopher Balding, puts it best: “The panic and debate clearly suffers from profound misunderstandings.”
That is probably another under-estimate.
Datasets used for this article are available upon request to: email@example.com