The crucial data we don’t have to fight Covid-19.
Six months into America’s battle with Covid-19, we still can’t really see the enemy.
There isn’t good real-time data on where the virus is and who it is infecting. Our diagnostic testing is at an all-time high, but it’s still missing the vast majority of infections.
We don’t have systematic surveillance programs like we do for the flu to fill in the gaps, and we don’t have good metrics that tell us how well the virus is being contained. We’re particularly in the dark about what’s happening in many minority communities, which have lower testing rates than white communities.
We don’t have good foresight into the future either: As the response to the pandemic grows more fractured, and the policies less consistent and more politicized, it’s getting harder to model.
“It’s like we’re flying blind,” says Sarah Cobey, an infectious disease modeler at the University of Chicago. To extend the metaphor: When airplane pilots can’t see out their windows, they can rely on their instruments to guide them through a storm. But with the pandemic “we don’t even have that,” Cobey says. “We don’t even have good numbers to be staring at to guide our flying.”
This blindness is particularly excruciating because institutions — like schools and universities — have to make hugely consequential decisions about reopening without clear data on what’s happening on the ground. The best data we have on community spread of Covid-19 is weeks out of date when it arrives. And schools won’t necessarily be able to monitor the consequences of their decisions in real time. With a virus capable of exponential growth, these lags in data can result in catastrophe.
What we do know is that we’re entering a newly dangerous period. As temperatures fall and people are forced back indoors in the coming months, it’s possible transmission rates of Covid-19 will increase even more.
If we’re going to fly this country out of the pandemic, we’re going to need more visibility on what’s going on.
Here’s what’s missing, and what we so desperately need.
1) We don’t have good real-time data. And data is particularly insufficient for minority communities.
Time is everything in the pandemic.
The quicker a Covid-19 case is identified, the quicker it can be isolated, the quicker contacts can be quarantined, the fewer people get infected, and so on. On a big-picture level: The faster a state or local government can identify a growing outbreak, the faster it can act to stem the outbreak.
What we need is a real-time view of Covid-19 transmission. And it simply doesn’t exist.
Ideally, we could get real-time transmission data from rapid Covid-19 diagnostic testing. But testing is currently backlogged in many places, with people waiting a week or more for results. So it can’t provide real-time data. Testing also does not provide a complete picture of who is getting infected.
“Four out of five infections are not … counted as cases,” Cobey says. They’re not being tested (these include milder or asymptomatic cases). “Right now, cases are so underreported, and they’re not just underreported in a consistent way — they’re underreported in a biased way.”
Minority communities, for instance, are not being tested at the same rate as white communities (despite bearing a disproportionate brunt of the pandemic’s toll). According to an investigation by FiveThirtyEight, Black and Hispanic communities face longer wait times for tests, and there are “fewer testing sites in areas primarily inhabited by racial minorities.” So testing data gives us a skewed picture of what’s going on.
Since we can’t use testing data for a view of overall community transmission, we have to extrapolate from hospitalizations and deaths. Researchers know, roughly, the ratio of hospitalizations and deaths to the amount of community spread. So they can work backward.
Yet hospitalizations and deaths are lagging indicators. They’re indicative of transmission that occurred three weeks ago or more.
“Under exponential growth, three weeks can really mean a huge increase in cases or in infections,” Jaline Gerardin, a Northwestern University computational epidemiologist, says. If it takes a week for cases to double, she explains, then a three-week lag of data means cases can increase eightfold.
Hospitalization data may also now be less reliable than before, due to the Trump administration. A recent change in policy to reroute Covid-19 hospitalization data from the Centers for Disease Control and Prevention (CDC) to the Department of Health and Human Services has created “temporary information blackouts,” ProPublica reports. The Covid Tracking Project, a watchdog journalism group collecting Covid-19 data, writes that “these problems mean that our hospitalization data — a crucial metric of the COVID-19 pandemic — is, for now, unreliable, and likely an undercount.”
But even pristine hospital data wouldn’t tell the whole story of the pandemic. Older people are more likely to be hospitalized, which makes it harder to deduce outbreaks among younger people from this data source. Gerardin says in Illinois, “the Hispanic/Latino population tends to skew younger than white or Black populations.” So relying on hospitalization data makes it harder to observe trends in this community.
In the absence of current, comprehensive data, scientists are developing new tools to make the best use of the data available to help guide difficult reopening decisions.
In advising Texas schools on reopenings, Lauren Ancel Meyers, director of the University of Texas Covid-19 Modeling Consortium, and her colleagues have created a calculator to estimate how many students and faculty may come to campus infected based on levels of community transmission. If Covid-19 prevalence in the community is 1 in 100, for example, then a school with 1,000 students and teachers could expect 10 people to arrive infected during a reopening, Meyers and her colleagues report. (Just one infected person is enough to start a large outbreak.)
But the key thing about this risk calculation is that it’s dependent on knowing the prevalence of Covid-19 in a community. Using hospitalization data, “we are able to kind of estimate how fast the virus was spreading about 10 days prior,” Meyers says. Ten days, while helpful, isn’t ideal: An outbreak can start to spark up in that time.
That’s the best they can do. But they can’t even do it for everywhere in the country: “Unfortunately, hospitalization data are not kind of widely available for all communities in the country,” she says.
So not only are some areas blind to their current conditions, they don’t even have a clear view of the recent past.
What we need: Surveillance testing
One of the most commonly reported metrics during the pandemic is the percentage of tests that come back positive. In May, the World Health Organization advised governments that before reopening, the rate of Covid-19 positive tests should remain at 5 percent or lower for at least 14 days. If the proportion of positive tests rises above 5 percent consistently, it’s a sign there’s an outbreak growing in the area (and not just a sign that more mild cases are being discovered due to increased testing).
The problem is that this metric — while helpful — is still crude. And at times, it can deliver ambiguous conclusions. “You could imagine a situation where the the epidemic is growing but the test positivity rate is actually decreasing,” Gerardin says. For example, if a university decides to test all its incoming students, or if a company decides to test all its employees before coming back to work, it can inflate the denominator of the equation.
“Changes in the denominator of who’s being tested is really important,” Cobey says. “And we don’t [currently] understand them.”
Instead of relying on this flawed metric, we need systematic surveillance.
A good surveillance system doesn’t need to include everyone tested, but just a predefined segment of the population, tracked carefully, and with good data.
Recently, Cobey and Gerardin consulted with the state of Illinois about setting up a surveillance system. Their idea is really simple: systematically record all the patients who arrive at outpatient clinics with symptoms. “You can estimate the effective reproductive number from that,” Cobey says. “You can get more up-to-date estimates and more precise estimates of what transmission rates are at different times.”
Ideally, “this would have been up and running before we started coming out of lockdown,” Gerardin says.
But currently, they say, there’s only one outpatient site in Illinois participating as a pilot, and it’s not enough to provide useful data. They’re not sure when the full program will be up and running.
“I don’t know of any US state that is doing good surveillance,” Cobey says, though she admits it’s hard to know what’s going on everywhere across the country. And that’s part of the problem, too: There’s no national standard for what Covid-19 surveillance ought to look like, or one place to go look up programs that exist. “It really confuses me why we’re not investing in this,” Cobey says of surveillance programs overall.
If testing programs across states were more careful and systematic in their current data collection and reporting — labeling why people received the tests, and if they have symptoms, noting when they started — it would help produce better real-time estimates of transmission.
“Even where there’s more testing, we’re almost never seeing the numbers broken down in a reasonable way,” Cobey says. “For instance, we don’t know which tests are from asymptomatics or symptomatics, or if tests come from outpatient sites or people showing up quite ill in the hospital.” Just ensuring every recorded case included a “date of symptom onset” would be helpful for better surveillance she says. “And that’s not collected the vast majority of the time.”
Other scientists are trying to find ways to fill in the gaps. Mauricio Santillana, a computational epidemiologist at Harvard, has been building a machine-learning program to use as a form of disease surveillance.
“What we seek to identify is, ‘Can we help these traditional data sources, identify outbreaks with more confidence?’” Santillana says.
Santillana and his colleagues combine data from UpToDate (a search engine for clinicians to look up disease symptoms), Google searches for fevers or Covid-19 symptoms, data from digital thermometers that pair with smartphones, and other information streams to predict outbreaks weeks before they show up in case-count data.
“We’re hoping to provide with these kinds of tools, confirmatory information to say, yes, cases are under control, or no,” Santillana says. This approach can’t replace traditional surveillance outright. (It also has some drawbacks: If people’s behavior starts changing in terms of Google searches or UpToDate searches, it could potentially alter the predictive ability of the program.) And while they have piloted the program with some success in China, he says, they’re also facing a frustrating roadblock in the US.
“Agencies such as the CDC see our work to some extent as novel and experimental, even though we have worked with the CDC for more than five years, using this data for influenza,“ Santillana says. “So that means that they’re not paying as much attention, they’re not funding our work.”
The state of surveillance testing is frustrating for scientists. Disease surveillance is no novel concept: It is regularly in use for the flu. “I had thought good surveillance would be the most obvious thing for politicians to want to invest in and improve, so they can adjust policy faster and with more data-driven authority, but it turns out it’s not,” Cobey says.
2) We don’t have good metrics on containment
There’s another view we don’t have: how well we have this virus contained.
“If you equate infectious disease transmission to fires, I could tell you where the major [Covid-19] fires are in the US — Florida, Texas, Arizona — I know where the fires are, especially if they are big,” says Cyrus Shahpar, the former lead of the Global Rapid Response Team at the CDC. With wildfires, authorities will report what percentage of the fire is contained, which is often a reflection of how well firefighting authorities are responding.
With Covid-19, we have few comparable metrics. “I have no idea what percent contained these [Covid-19] fires are,” Shahpar says. “You could have a small fire that isn’t contained and that’s the problem. Or you could have a medium fire that is contained and that’s better contained. That matters, and that’s where we have the biggest information gap.”
What would you want to know to assess Covid containment? Shahpar, along with Tom Frieden, who served as CDC director under President Obama, and their group Resolve to Save Lives, have 15 essential indicators for all states to report to better understand how good (or bad) of a job they are doing in responding to the pandemic.
They include metrics like: What proportion of cases are isolated within 48 hours (to get a sense if new sparks in the fire are being quickly contained)? What percentage of cases are linked to previously known cases? (The less we know about chains of transmission, the less we probably know about the scope of the entire outbreak.) How long, on average, does it take to isolate a case?
“I have no idea how these things are going in Florida and Texas in other any state, really, because they don’t report it,” Shahpar says. States are all reporting their own hodgepodge of metrics, which makes it hard to get a clear picture of pandemic containment at a national level. Federal guidance from the CDC and White House to states has been slow, and so states had to come up with their own plans. But that makes the scope of the pandemic hard to track.
“Even if the metrics aren’t great right now, and I think that some places they aren’t, you need to know what they are. So that when we improve, we know that we’ve improved,” Shahpar says.
Resolve to Save Lives is keeping track of which states report this type of containment data, and so far, the vast majority don’t. The states that do track it aren’t doing so in a standardized way, so it’s hard to make cross-state comparisons. Shahpar blames a lack of federal guidance.
“If you look at the 50 reopening plans of each of the states, they’re all different,” he says. “They all look different; what they look at is different. And so we’ve kind of already proceeded into a place where everything’s different. So it’s much harder to get it all aligned.”
3) The future of the pandemic is difficult to model right now
Everyone wants to know what’s going to happen next. But here’s the truth: It’s really hard to know what shape the pandemic will take this coming fall. There’s an element of chaos in all of this. And it’s getting harder to model outcomes.
“We have understood from the very beginning that this the way this virus spreads fundamentally depends on behavior and politics,” Meyers says. “And our behavior changed in unprecedented, dramatic ways.” (Who could have predicted people protesting mask-wearing, for instance?) “We really cannot predict what people are going to be doing, you know, next week,” she says. For that reason — and others — her team’s models never try to project more than three weeks into the future, she says.
Cobey agrees: “I expect that, you know, our predictive abilities are going to improve with time,” she says. “But at the moment, I think this is a particularly dark spot.”
There are still so many uncertainties that will determine our course: How much of a role kids play in transmission is still a mystery that’s being sorted out (though it’s increasingly clear kids can get infected and transmit the virus with frequency), and how people will continue to adhere to mask-wearing guidance.
While we might not be able to intuit the future, and we might not have great vision on what’s currently happening, it doesn’t mean we’re powerless. We know the conditions under which the pandemic grows worse. We can continue to social distance, continue to wear masks, continue to try to test, trace, and isolate.
“What we see over and over again in our modeling,” Jeffrey Shaman, an infectious disease modeler at Columbia University, explains, is that the ability to open up places like schools safely depends on “how much virus there is out there right now, how many cases you’re seeing in the last four days, and how much it’s growing at that time.” Right now we have to be really cautious, and act knowing that this data arrives already out of date.
Without clear vision in this storm, “we’re going to be living with coronavirus much longer, with much more preventable death and disability than any other part of the world,” Shahpar says.
In the coming months, the storm might grow worse. And as temperatures begin to drop, we should prepare for Covid-19 transmission to increase. “It’s what we see in every other acute respiratory viral infection,” Cobey says.