Illustration of social distancing

Analysis of Google Community Mobility Data and the Number of COVID-19 Cases

TL;DR

  • Community mobility is weakly correlated with the number of daily additions of COVID-19 cases. This conclusion is based on analysis that links Google Community Mobility and the global COVID-19 cases data. Other factors may form stronger correlation with the number of COVID-19 cases, e.g. strict social distancing measures outside of homes or potential infection inside homes.
  • Two kinds of mobility data were analyzed: number of visits to workplace (workplace mobility) and duration of stay at home (residential mobility). Both types of data were measured as daily percentage change compared to mobility levels during pre-pandemic baseline days.
  • On average, at global level the daily addition of COVID-19 cases show weak positive correlation with workplace mobility and weak negative correlation with residential mobility. This means that the rise of COVID-19 cases numbers is linked to the rise in workplace visit and the decrease in stay-at-home duration.

  • Correlation coefficients used on this analysis were Pearson correlation, Spearman correlation, and Kendall correlation.
  • To minimize distortions, the correlation analysis was performed using only the data from weekdays. This practice slightly increased the correlations’ strengths.
  • Based on the preliminary results, the Kendall correlation and the workdays only data seemed to be the most suitable tools and treatment for analysis of this kind.
  • Average correlation of residential data with daily case additions was higher than average correlation of workplace data with daily case additions. This suggests that residential mobility may be more reliable in helping to make inference about confirmed cases.
  • Country populations were plotted with the correlation data. Based on visual inspection, the population data didn’t seem to form any strong pattern pertaining to whether a country’s population affect the country’s mobility – confirmed cases correlation.
  • Weak correlations of mobility data with daily case additions may be influenced by the differences in coverage of the mobility and cases data, COVID-19 spread inside homes, as well as strict social distancing measures.
  • Further studies linking mobility and COVID-19 cases should be conducted at more specific geographic area and be mindful of the pandemic-related-conditions of the community under study.
  • Interactive charts covering countries data are produced from the analysis and are available for viewers’ exploration.

Featured photo illustration by Evgeni Tcherkasski on Unsplash

BACKGROUND

Tech giants such as Google, Facebook, Apple released anonymous community mobility data to help decision makers respond to the COVID-19 pandemic. These data are extensive and interesting, as they can include detailed areas and document specific behaviour, such as staying at home, walking, visiting workplace, visiting parks, etc. Despite the purpose of the data to help in pandemic handling, there are not many studies that links the mobility data with the confirmed cases of COVID-19. I found a study of this kind to be interesting, relevant, and potentially impactful. Therefore I decided to do this study of analysing mobility and COVID-19 cases data as my first data analysis project.

THE DATA

Data used in this analysis were Google COVID-19 Community Mobility Reports, global COVID-19 cases datasets maintained by Center for Systems Science and Engineering at Johns Hopkins University and world population dataset from The World Bank. The Google mobility data were chosen over Apple and Facebook data because the Google data have wider coverage than the others. Also, the Google data have suitable details for this analysis and they are more easy to analyze. All data in this analysis were obtained on 12th October 2020. The mobility – COVID-19 cases weekdays only chart above will be updated weekly but the analysis and other charts here use only the data from 12th October 2020.

Google COVID-19 Community Reports data consist of information about daily changes of community mobility compared with baseline dates just before the pandemic. These baseline dates, which are 3rd January 2020 to 6th February 2020, are intended to be benchmark days to get an overview of how the community mobility changes in response to the pandemic. The data are reported in per-country basis, with further division into one or more levels of subregions on some countries. There are four categories of mobility reported in the Google mobility data, namely retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. The residential data measures the change of the users’ duration of stay at home. Other categories measure changes in total visitors of the respective places. Location accuracy and the definiton of the place categories vary by region. For the mobility data to be included in the report, they need to meet the quality and privacy threshold. This means that if there are not enough good mobility data for certain days or regions,  the data from those days or region will not be shown in the reports and will create gaps. For this analysis, only the residential and workplace data on the country level were used.

The global COVID-19 cases data were obtained from The Johns Hopkins Coronavirus Resource Center and maintained by Center for Systems Science and Engineering at Johns Hopkins University (JHU). The data are aggregated from countries worldwide. For few countries, the data are divided further into subregions. This analysis used country data from four datasets, namely the global confirmed cases, the U.S. confirmed cases, global deaths, and U.S. deaths.  The datasets contain the daily total of COVID-19 cases and deaths. The world population data were obtained from The World Bank.

METHODOLOGY

To obtain insights of about how the community mobility relates to COVID-19 cases, the mobility data were linked to COVID-19 confirmed cases and COVID-19 deaths for every country data. The COVID-19 cases data used were daily additions, meanwhile the JHU datasets only include daily tally of cases. To get the number of daily case and deaths additions of a specific day, the number of confirmed cases and deaths on the previous day were subtracted from the number of cases and deaths of that specific day. To fit on the same charts as the mobility data, the number of daily case and deaths additions were scaled down by dividing them by 50 for convenient viewing.

In the JHU confirmed cases data, there were instances where the calculated daily additions turned out to be negatives. This is due to a number of reasons, particularly with the data originating from source countries. For example, some time in June Spain changed their methodology in counting the confirmed cases which resulted in minus tally for several days. To anticipate this, the negative numbers of the daily additions of a country were replaced with the median of the whole daily case additions of that country.

The degree of correlation were measured using three commonly used rank correlation coefficients, namely the Pearson coefficient, Spearman coefficient, and Kendall coefficient.

Population and weekend factors were later included in the analysis to help gain additional insights of the resulting calculations. Countries’ population data were plotted with the coefficients data to see if there were any patterns formed. The weekend data of the mobility data caused the weekly jagged pattern in the mobility charts. This is because on weekends the community mobility of during the pandemic is more or less the same with their mobilty before the pandemic. The jagged pattern may distort the correlations calculation as the cases data lacked such pattern. To anticipate this potential distortion, the weekends data both for the mobility and the COVID-19 case were removed from the charts and calculations.

All of the calculations and data processing were conducted using Python programming language and Pandas library. The visualizations were created using Tableau Public.

RESULTS

Data processing and visualization resulted in time series charts of Google community mobility data with the COVID-19 cases that are provided below. The data were provided on a per-country basis.

Correlation coefficient measurement results, namely the Pearson coefficient, Spearman coefficient, Kendall coefficient are as follow:

COVID-19 Cases – Google Mobility Correlation Coefficient Stats

The three rank coefficients show little difference. They all indicate similar aspects of the data:

  • weak negative correlation between the residential mobility and COVID-19 confirmed cases
  • weak positive correlation between the workplace mobility and COVID-19 confirmed cases
  • strong negative correlation between the residential mobility and workplace mobility
  • strong positive correlation between COVID-19 confirmed cases and deaths

Based on the preliminary judgement of the measurement results, the Kendall coefficient seemed to be the most suitable measurement for this analysis. Also, the weekdays data seemed to produce better measurements. Therefore, for brevity’s sake, further discussion and analysis of the data use the Kendall coefficient and the weekdays data.

Total count of countries analyzed: 131

Kendall correlation coefficient for global workplace mobility and COVID-19 positive cases:
Mean / average => 0.13
Standard deviation => 0.183
Minimum => 0.493
Median => 0.164
Maximum => 0.494

Kendall correlation coefficient for global residential mobility and COVID-19 positive cases:
Mean / average => -0.268
Standard deviation => 0.157
Minimum => -0.67
Median => -0.279
Maximum => 0.219

The countries’ populations – Kendall coefficient of residential mobility and positive cases data plot is as follows:

The distribution of data points on the plot suggests a pattern of countries with higher populations converge to the mean of the correlation coefficients data. However, this pattern still seems inconclusive.

The distribution of Kendall coefficient of residential and positive cases data form a more or less a normal distribution as follows:

The countries’ correlation coefficients of COVID-19 daily cases additions with mobility can be found here (weekdays only data) and here (whole week data). These data were last updated on 12th October 2020.

Tables of correlation coefficient of countries for whole week data and other histograms of correlation coefficients distributions are provided at the Appendix part below.

The complete processed datasets can be found at my github repository.

DISCUSSION

Based on common knowledge about how the COVID-19 virus spread, we may anticipate that community mobility is strongly correlated with COVID-19 confirmed cases numbers, i.e. the upward trend of confirmed cases closely follows the rise of community mobility level outside of home. However, based on the analysis, this was not the case as the resulting analysis produced weak correlations. There are several potential factors for those weak correlations that I would like to discuss. For the record, I have no expertise in epidemiology. So please take this discussion with a grain of salt.

The weak correlation between the cases and the mobility may be caused by differences in coverage between the mobility data and confirmed cases data. Confirmed cases data include number of cases collected from the whole of a country. Meanwhile, the mobility data collects data only from people who use Android devices and activates their location history feature. Also, the mobility data only include a group of data points after they have reached a certain threshold in quantity. This further restricts the coverage of the data. It means that a village with low number of Android device users and high confirmed case would exaggerate the correlation to whatever mobility level the respective country has.

The coronavirus may also spread among residents inside homes. For example, infection clusters emerging in migrant dormitories in Singapore  or infections clusters in nursing facilites in U.S.A. The mobility report would have recorded people in such cases to have high levels of mobility in the residential places. Therefore, the correlation coefficient resulted would suggest stronger positive correlation between confirmed cases and residential mobility.

Successful pandemic response actions may also affect the confirmed cases and mobility correlation. With effective safety actions such as lockdowns or strict social distancing measures for out-of-home activities, we can expect low number of confirmed cases even as the workplace mobility increases and residential mobility decreases.

To anticipate influence of the aforementioned factors (data coverage differences, infection clusters at homes, effective safety actions) and to test the result of the current analysis, further study of linkage between mobility data and COVID-19 should be directed at regional or subregional level. The future study may include only areas with high number of Android users, e.g. cities. Specific conditions as well as history of the pandemic impacts and responses of the area should also be accounted for in further analysis. For example, whether a strict lockdown or physical distancing on public spaces are in place or whether the past infection clusters involved activities inside homes. Further studies may also be directed at identifying other community factors that together with the mobility correlates with the fluctuations of COVID-19 cases.

Factors such as strict social distancing measures outside of homes or potential infection inside homes may have stronger correlations with the number of daily additions of COVID-19 cases. There are most probably causal relationships between mobility and COVID-19 cases. However, they should be examined in more rigorous, detailed studies and are beyond the scope of this analysis.

The correlation coefficient between workplace mobility and confirmed cases were lower than the coefficient between residential mobility and confirmed cases. This may be because the mobility in workplace is only a fraction of the mobility spent outside of home. As mentioned previously, the Google COVID-19 community mobility data for the out-of-home activites also consist of retail and recreation, groceries and pharmacies, parks, and transit stations, all of which were not analyzed here. The inclusion of these out-of-home mobility in the analysis may provide more robust insights about the correlation between COVID-19 cases and community mobility.

To improve the analysis, data from weekends were omitted. This treatment increases most of the correlation coefficient, albeit slightly. It also made the charts easier to read. However, this practice eliminates potential data points such as major fluctuations of cases or mobility that occur during weekends. Therefore, in further studies there should b prior examination about whether there are major incidence and important dates occured during weekends, i.e. long and important holidays, cases or mobility spikes.

CONCLUSION

Based on the analysis of Google Community Mobility and the number of COVID-19 cases, there is a weak negative correlation between the number of COVID-19 cases and the duration of stay at home. Meanwhile, there is a positive weak correlation between the number of COVID-19 cases and the number of visit to workplace. This means that the increase of COVID-19 cases correlates weakly with the increase of workplace visits and the decrease of stay-at-home duration. The correlation may have been influenced by factors such as the difference in data coverage, the community’s social distancing measures, and other conditions that are region-specific. Factors other that community mobility may have stronger correlations with the number of daily additions of COVID-19 cases. To anticipate such factors and to test the results of this analysis, further studies linking mobility and COVID-19 cases should be conducted at more specific geographic area and be mindful of the pandemic-related-conditions of the community under study.

APPENDIX