In previous posts, we discussed our massive ongoing symptom surveys that have reached over 12 million people in the U.S. since April 2020, in partnership with Facebook and Google. Another one of our major data initiatives is based on partnerships with healthcare systems, granting us access to various aggregate statistics from hospital records and insurance claims covering 10-15% of the United States population. From these data, we can extract informative indicators that can be early indicators of COVID activity. This post focuses on one indicator in particular, based on outpatient visits, and demonstrates both the challenges and promises associated with medical records data.
Beginning on September 8, 2020, we deployed a new version of our symptom survey.
Facebook helps us recruit tens of thousands of respondents daily, and the new survey gives us unprecedented insights into the effects of COVID-19 across the United States.
Today we release new public datasets and share maps revealing access to COVID testing, test results, and public use of masks.
One of our primary initiatives at the Delphi COVIDcast project
has been to curate a diverse set of COVID-related data streams,
and to make them freely available through our
COVIDcast Epidata API.
These include both novel signals that we have collected and analyzed ourselves,
such as our symptom survey distributed by Facebook
to its users, Google’s symptom survey whose results are delivered to us,
the percentage of doctor’s visits due to COVID-like illness,
and results from Quidel’s antigen tests;
and also existing signals, such as the confirmed case counts
and deaths reported by USA Facts and Johns Hopkins University.
The COVIDcast API freely provides researchers and decision-makers
with the data they need to conduct their work, and
is conveniently accessible via easy-to-use Python
and R packages.
Building on our previous two posts (on our COVID-19 symptom surveys through
Facebook and Google)
this post offers a deeper dive into empirical analysis, examining whether the
% CLI-in-community indicators from our two surveys can be used to improve
the accuracy of short-term forecasts of county-level COVID-19 case rates.
Since April 2020, in addition to our massive daily survey advertised on
Facebook, we’ve been running (even-more-massive) surveys through Google to
track the spread of COVID-19 in the United States.
At its peak, our Google survey was taken by over 1.2 million people in a single
day, and over its first month in operation, averaged over 600,000 daily
respondents. In mid-May, we paused daily dissemination of this survey in order
to focus on our (longer, more complex) survey through Facebook,
but we plan to bring back the Google survey this fall.
This short post covers some key differences between our Google and Facebook
surveys, explains the backstory behind the “CLI-in-community” question
as it arose through our collaboration with Google,
and shares some of our thinking about next steps for the Google survey.
Since April 2020, in collaboration with Facebook,
partner universities, and public health officials,
we’ve been conducting a massive daily survey to monitor
the spread and impact of the COVID-19 pandemic in the United States.
Our survey, advertised by Facebook, is taken by about 74,000 people each day.
Respondents provide information about COVID-related
symptoms, contacts, risk factors, and demographics,
allowing us to examine county-level trends across the US.
We believe that this combination of detail and scale
has never before been available in a public health emergency.
In this post, we’ll share some of our initial survey findings,
show you how to access the data, and highlight
some of the exciting new directions that we’re pursuing.