As COVID cases and deaths continue to rise in the United States, we are repeatedly reminded that unless we take the appropriate precautions—by wearing masks when around other people, working from home whenever possible, and avoiding travel and crowded places—there will be a high rate of COVID deaths. Vaccines are on their way, but until they arrive, we must continue social distancing.
But after all the precautions we’ve taken and the sacrifices we have made in our lives during 2020, we have seen many people ask: Why haven’t these precautions prevented the current COVID spike? Some have even argued that this is proof that masks and social distancing do not work. Can this claim be justified?
To answer this question, we must first ask: Is everyone wearing masks? And are they also socially distancing in other ways? In this post, we’ll show how Delphi’s COVID symptom surveys can help answer these questions and how they can enable important research about the pandemic and its spread.
Since April, and in partnership with Facebook Data for Good and the University of Maryland, Delphi has conducted daily surveys of Facebook users throughout the United States. These surveys ask respondents about their experiences during the pandemic, and ask whether they’re experiencing symptoms, whether they are isolating or following precautions, and how they have been affected by the pandemic. The survey has been completed over 14 million times since April, allowing us an unprecedented level of insight into COVID and people’s experiences, down to individual counties around the country.
Since early September we have asked all respondents a question about mask use:
In the past 5 days, how often did you wear a mask when in public?
- All the time
- Most of the time
- Some of the time
- A little of the time
- None of the time
- I have not been in public during the past 5 days
Early analysis suggests that mask usage is high in most states, and in those where it was lower, it has been gradually increasing. For example, the graph below shows the percentage of respondents who answer “Most of the time” or “All of the time” in five states and the District of Columbia. In three of these (the District of Columbia, Massachusetts, and New York), mask usage is among the highest in the country; in the other three (South Dakota, Idaho, and Wyoming), it’s among the lowest.
library(covidcast)
library(dplyr)
library(ggplot2)
library(directlabels)
options(covidcast.auth = Sys.getenv("API_KEY")) # for more on API keys, see: https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html
states_of_interest <- c("dc", "ma", "ny", "wy", "sd", "id")
state_masks <- covidcast_signal(
"fb-survey", "smoothed_wwearing_mask",
start_day = "2020-09-15", end_day = "2020-12-05",
geo_type = "state", geo_values = states_of_interest
)
ggplot(state_masks, aes(x = time_value, y = value, color = geo_value)) +
geom_line() +
geom_dl(aes(label = toupper(geo_value)), method = "last.bumpup") +
scale_x_date(date_breaks = "1 month",
date_labels = "%b %Y",
date_minor_breaks = "1 week") +
labs(x = "Date", y = "% wearing masks in public most/all the time",
title = "Self-reported public mask usage",
subtitle = "From Delphi's surveys, conducted through Facebook",
caption = "Data from Delphi COVIDcast, delphi.cmu.edu") +
theme_bw() +
guides(color = FALSE)
Despite the gap between states, it also seems clear that self-reported mask use has been on the rise in states whose mask use started low. This is quite encouraging.
(Note that, as with most of Delphi’s blog posts, all the code and data used to make these plots is available—simply click the Code button to see how our API clients make it easy to access aggregated data.)
Seeing these numbers—particularly the very high reported mask usage in many states—might make you wonder: If mask usage is over 80% in most states and over 90% in many, why does the rate of new COVID cases in the United States currently look like this?
cases <- covidcast_signal(
"indicator-combination", "confirmed_7dav_incidence_num",
start_day = "2020-09-15", end_day = "2020-12-05",
geo_type = "state")
cases %>%
group_by(time_value) %>%
summarize(cases = sum(value)) %>%
ggplot(aes(x = time_value, y = cases)) +
geom_line() +
scale_x_date(date_breaks = "1 month",
date_labels = "%b %Y",
date_minor_breaks = "1 week") +
labs(x = "Date", y = "Reported new cases",
title = "Number of reported new cases per day",
subtitle = "7-day rolling average",
caption = "Data from Delphi COVIDcast, delphi.cmu.edu") +
theme_bw()
While we can’t explain all of COVID in one blog post—nor do we have the data and science needed to do so—we can address two key questions:
To answer the first question, we must consider what kinds of biases could affect the surveys:
These biases are hard to measure directly. There’s another problem too: If 90% of people say they wear masks when they’re in public, that doesn’t mean that 90% of people in public places are wearing masks! For example, if people who wear masks also tend to limit the amount of time they spend in public places, while people who do not wear masks spend more time in public, people in public may be more likely to also be people who don’t wear masks.
We don’t have a way to tell if a respondent is truthful, and since there’s no objective national data on the exact percentage of people wearing masks in public, we have no way to check exactly how biased results could be. We have seen hints, however: when researchers in Philadelphia studied surveillance camera footage to count how many people in public places wear masks correctly, only 43% did, despite over 90% of Philadelphia respondents to our survey saying they wear masks most or all of the time. Unfortunately this is only one study in one city, and doesn’t give us a picture of mask usage nationally.
After consulting with survey experts, we chose a different way to check our mask use estimates. In late November, we added a new mask question to the survey:
In the past 7 days, when out in public places where social distancing is not possible, about how many people would you estimate wore masks?
- All of the people were wearing masks
- Most of the people were wearing masks
- Some of the people were wearing masks
- A few of the people were wearing masks
- None of the people were wearing masks
- I have not been out in public places in the past 7 days
Because this question asks about other people, it bypasses social desirability bias. And even if the survey sample is not representative, as long as the respondents visit representative public places (like grocery stores or public transit) and see other people there, we may get a better estimate of public mask use.
Let’s see how the data looks. We’ll plot the percentage of people answering “all” or “most” to each mask question in each state:
masked <- covidcast_signal(
"fb-survey", "smoothed_wwearing_mask",
start_day = "2020-12-01", end_day = "2020-12-01",
geo_type = "state")
other_mask <- covidcast_signal(
"fb-survey", "smoothed_wothers_masked",
start_day = "2020-12-01", end_day = "2020-12-01",
geo_type = "state")
joined <- masked %>%
inner_join(other_mask, by = "geo_value",
suffix = c(".self", ".other"))
ggplot(joined, aes(x = value.self, y = value.other,
label = toupper(geo_value))) +
geom_text() +
geom_abline(slope = 1, intercept = 0, linetype = "dashed",
colour = "#888888") +
geom_text(aes(x = 83, y = 85, angle = 15, label = "equal percentages"),
colour = "#888888") +
labs(x = "% who report wearing masks most/all the time",
y = "% who report most/all others wear masks",
title = "Mask use reported in symptom survey",
subtitle = "December 1st, 2020",
caption = "Data from Delphi COVIDcast, delphi.cmu.edu") +
theme_bw()
There are a few things worthy of note. There is clearly a strong relationship here—but notice that in South Dakota and Wyoming, where nearly 80% report they wear masks most or all of the time, fewer than half of respondents agree that most or all other people wear masks in public places. Only in a few states do over 90% of respondents agree that most other people wear masks:
joined %>%
filter(value.other >= 90) %>%
arrange(desc(value.other)) %>%
select(geo_value, value.other) %>%
mutate(geo_value = abbr_to_name(geo_value, ignore.case = TRUE)) %>%
knitr::kable(col.names = c("State", "% agree that most/all others are masked"),
digits = 1)
State | % agree that most/all others are masked |
---|---|
Connecticut | 93.3 |
Massachusetts | 93.3 |
District of Columbia | 92.8 |
Vermont | 92.4 |
Maryland | 91.6 |
Rhode Island | 91.3 |
Hawaii | 91.1 |
Delaware | 90.5 |
New Jersey | 90.3 |
New York | 90.2 |
But we should also be careful in interpreting this. In South Dakota and Wyoming, fewer than half of respondents agree that most or all other people wear masks. We can conclude that it’s clearly untrue that “everyone wears masks”, but this is not the same thing as saying that fewer than half of people wear masks. For example, if people interpret “most people wear masks” to mean “I only see one or two people not wearing masks when I’m in public”, “most” could correspond to 80 or 90% mask use, rather than just 50%. It’s not obvious how to translate the survey question into an estimate of the percent of people wearing masks. So what can we learn from this?
Permit me to discuss some probability theory for a moment. It’s fine if you don’t follow the math—the key point will be the probability it allows me to calculate two paragraphs from now.
Suppose that the true proportion of people who wear masks in public in a certain state is \(p\). Suppose those people are uniformly distributed throughout the state. If you go out in a public place and see 20 people, the number of those people who are wearing masks is a random variable \(X\) with distribution \(\text{Binomial}(20, p)\).
Suppose \(p = 0.8\), meaning 80% of people wear masks. With a bit of math, we find that on a typical trip into public, the chance of seeing at least 4 people not wearing masks is \(\Pr(X \leq 16)\), or 59%. To reduce that chance to 10%, we’d need 91% mask compliance, showing how small changes in mask usage can cause big changes in risk.
This means that the apparent contradiction we saw in the graph above—with respondents claiming that they wear masks but disputing that most others do—may not be a contradiction. Depending on how respondents interpret “most”, it might be expected. After all, if you wear a mask yourself but then run into four different people who do not while getting groceries, you could be quite justified in denying that “most” people wear masks.
But more importantly, this shows why a high rate of mask use is so important. If 80% of people wear masks, you are still likely to run into several unmasked people on a trip to any public place, placing you at higher risk. The problem only gets worse if our assumption that mask use is uniformly distributed is false, and if people without masks tend to visit the same locations or events, or gather together for social events, or go out in public more often than people who do wear masks.
In short, there are numerous opportunities for COVID transmission even when over three-quarters of people wear masks. Only a very high rate of mask usage can dramatically reduce these opportunities.
Above, we’ve seen how our survey data can shed light on mask use in the United States. Mask use is not as universal as it might seem, and there’s substantial variation between states on mask use and other social distancing behaviors. Insights like these are important: Using this information about mask use and activity, authorities can target their emergency orders and provide more specific advice to the public.
While a short blog post can’t hope to offer a thorough analysis proving which behaviors reduce spread the most, we make all our data publicly available so that the public health researchers building detailed models of COVID’s spread can combine it with other data to study how the pandemic spreads and what interventions are most effective to stop it.
If you’re one of those researchers, or you think this data could be useful for your reporting or decision-making, see our COVIDcast Epidata API for details on how to get it. Numerous aggregates from the survey data, including more measures of public activity, are available through the API, and can be easily downloaded using our R and Python packages. All aggregates are de-identified to protect the privacy of survey respondents.
We also make the raw response data available to academic and nonprofit researchers, and we look forward to seeing more detailed analyses of this data. For example, it would be important to know if people who do not wear masks are more likely to do activities near other people, such as attending public events or spending time with friends who do not live with them, than those who wear masks regularly. Our survey asks respondents about these types of activities, allowing us to understand how behavior varies. If you’re an academic or nonprofit researcher and want to do this kind of research, check out our survey documentation to get started. And if you’re interested in similar data in numerous other countries around the world, check out the University of Maryland’s survey, which runs in over 50 languages worldwide.
Acknowledgements: Wichada La Motte-Kerr drafted numerous survey revisions, collected input from experts, and managed the process of deploying the new survey questions. Sarah LaRocca and Katherine Morris at Facebook gave important input and helped deploy the survey. Kathryn Mazaitis assisted in producing the new aggregate survey data.
Related Posts: