epidatr is designed to streamline the downloading and usage of data from the Delphi Epidata API. It provides a simple R interface to the API, including functions for downloading data, parsing the results, and converting the data into a tidy format.
In August 2022, the Delphi team discovered a fault in the data that we were sending through the COVIDcast API. The fault caused the API to return past versions of data as if they were actually the latest version of that requested data. The extent of this fault included all of the signals from Johns Hopkins University Center for Systems Science and Engineering (JHU-CSSE) for the months of February 2020 - October 2021. About 12 million data points were found to be faulty, which made up about 20% of the data available from JHU-CSSE as of the time of the fix. This was patched on September 28th, 2022 and the API is now returning the correct version of the data.
In previous posts, we discussed our massive ongoing symptom surveys that have reached over 12 million people in the U.S. since April 2020, in partnership with Facebook and Google. Another one of our major data initiatives is based on partnerships with healthcare systems, granting us access to various aggregate statistics from hospital records and insurance claims covering 10-15% of the United States population. From these data, we can extract informative indicators that can be early indicators of COVID activity. This post focuses on one indicator in particular, based on outpatient visits, and demonstrates both the challenges and promises associated with medical records data.
Beginning on September 8, 2020, we deployed a new version of our symptom survey.
Facebook helps us recruit tens of thousands of respondents daily, and the new survey gives us unprecedented insights into the effects of COVID-19 across the United States.
Today we release new public datasets and share maps revealing access to COVID testing, test results, and public use of masks.
One of our primary initiatives at the Delphi COVIDcast project
has been to curate a diverse set of COVID-related data streams,
and to make them freely available through our
COVIDcast Epidata API.
These include both novel signals that we have collected and analyzed ourselves,
such as our symptom survey distributed by Facebook
to its users, Google’s symptom survey whose results are delivered to us,
the percentage of doctor’s visits due to COVID-like illness,
and results from Quidel’s antigen tests;
and also existing signals, such as the confirmed case counts
and deaths reported by USA Facts and Johns Hopkins University.
The COVIDcast API freely provides researchers and decision-makers
with the data they need to conduct their work, and
is conveniently accessible via easy-to-use Python
and R packages.
Building on our previous two posts (on our COVID-19 symptom surveys through
Facebook and Google)
this post offers a deeper dive into empirical analysis, examining whether the
% CLI-in-community indicators from our two surveys can be used to improve
the accuracy of short-term forecasts of county-level COVID-19 case rates.
Since April 2020, in addition to our massive daily survey advertised on
Facebook, we’ve been running (even-more-massive) surveys through Google to
track the spread of COVID-19 in the United States.
At its peak, our Google survey was taken by over 1.2 million people in a single
day, and over its first month in operation, averaged over 600,000 daily
respondents. In mid-May, we paused daily dissemination of this survey in order
to focus on our (longer, more complex) survey through Facebook,
but we plan to bring back the Google survey this fall.
This short post covers some key differences between our Google and Facebook
surveys, explains the backstory behind the “CLI-in-community” question
as it arose through our collaboration with Google,
and shares some of our thinking about next steps for the Google survey.
Since April 2020, in collaboration with Facebook,
partner universities, and public health officials,
we’ve been conducting a massive daily survey to monitor
the spread and impact of the COVID-19 pandemic in the United States.
Our survey, advertised by Facebook, is taken by about 74,000 people each day.
Respondents provide information about COVID-related
symptoms, contacts, risk factors, and demographics,
allowing us to examine county-level trends across the US.
We believe that this combination of detail and scale
has never before been available in a public health emergency.
In this post, we’ll share some of our initial survey findings,
show you how to access the data, and highlight
some of the exciting new directions that we’re pursuing.
Hello from the Delphi research group at Carnegie Mellon University!
We’re a group of faculty, students, and staff, based primarily out of CMU
together with strong collaborators from other universities and industry.
Our group was founded in 2012 to advance the theory and practice of epidemic
forecasting. Since March 2020, we have refocused efforts towards helping combat
the COVID-19 pandemic, by supporting informed decision-making at federal, state,
and local levels of government and in the healthcare sector. Until now, we’ve
been pretty “heads down” with our work, and slow to communicate what we’ve been
up to. But at last … Delphi finally has a blog! This first post serves as an
introduction of sorts. Future posts will dive deeper into our various projects.