Saturday, November 7, 2020

Data-Driven Mobility Models for COVID-19 Simulation

John Pesavento, Andy Chen, Rayan Yu, Joon-Seok Kim, Hamdi Kavak, Taylor Anderson, Andreas Züfle


Agent-based models (ABM) play a prominent role in guiding critical decision-making and supporting the development of effective policies for better urban resilience and response to the COVID-19 pandemic. However, many ABMs lack realistic representations of human mobility, a key process that leads to physical interaction and subsequent spread of disease. Therefore, we propose the application of Latent Dirichlet Allocation (LDA), a topic modeling technique, to foot-traffic data to develop a realistic model of human mobility in an ABM that simulates the spread of COVID-19. In our novel approach, LDA treats POIs as “words” and agent home census block groups (CBGs) as “documents” to extract “topics” of POIs that frequently appear together in CBG visits. These topics allow us to simulate agent mobility based on the LDA topic distribution of their home CBG. We compare the LDA based mobility model with competitor approaches including a naive mobility model that assumes visits to POIs are random. We find that the naive mobility model is unable to facilitate the spread of COVID-19 at all. Using the LDA informed mobility model, we simulate the spread of COVID-19 and test the effect of changes to the number of topics, various parameters, and public health interventions. By examining the simulated number of cases over time, we find that the number of topics does indeed impact disease spread dynamics, but only in terms of the outbreak's timing. Further analysis of simulation results is needed to better understand the impact of topics on simulated COVID-19 spread. This study contributes to strengthening human mobility representations in ABMs of disease spread.

Three promising high school students, John Pesavento, Andy Chen, and Rayan Yu, presented our recent research "Data-Driven Mobility Models for COVID-19 Simulation" at the 3rd ACM SIGSPATIAL Workshop on Advances in Resilient and Intelligent Cities (ARIC 2020)! I am a witness to their hard work during Aspiring Scientists Summer Internship Program (ASSIP). These dilligent students were mentored by Dr. Andreas Züfle and Dr. Hamdi Kavak and co-mentored by Dr. Anderson Taylor and myself. The presentation at ARIC 2020 was very clear. Well done!

You can check out interactive heatmaps and more information in John Peasavento's GitHub Page. This research is supported by National Science Foundation and by the Aspiring Scientists Summer Internship Program (ASSIP) at George Mason University.

J. Pesavento, A. Chen, R. Yu, J.-S. Kim, H. Kavak, T. Anderson, and A. Züfle, “Data Driven Mobility Models for COVID-19 Simulation,” In Proceedings of the 3rd ACM SIGSPATIAL Workshop on Advances in Resilient and Intelligent Cities (ARIC 2020), November 2020, pp. 29-38 

Tuesday, November 3, 2020

COVID-19 Ensemble Models Using Representative Clustering

Joon-Seok Kim, Hamdi Kavak, Andreas Züfle, Taylor Anderson


In response to the COVID-19 pandemic, there have been various attempts to develop realistic models to both predict the spread of the disease and evaluate policy measures aimed at mitigation. Different models that operate under different parameters and assumptions produce radically different predictions, creating confusion among policy-makers and the general population and limiting the usefulness of the models. This newsletter article proposes a novel ensemble modeling approach that uses representative clustering to identify where existing model predictions of COVID-19 spread agree and unify these predictions into a smaller set of predictions. The proposed ensemble prediction approach is composed of the following stages: (1) the selection of the ensemble components, (2) the imputation of missing predictions for each component, and (3) representative clustering in application to time-series data to determine the degree of agreement between simulation predictions. The results of the proposed approach will produce a set of ensemble model predictions that identify where simulation results converge so that policy-makers and the general public are informed with more comprehensive predictions and the uncertainty among them. 

Courtesy of NSFwe shared our vision at the 1st ACM SIGSPATIAL International Workshop on Modeling and Understanding the Spread of COVID-19 presented by Dr. Hamdi Kavak. 

J-S. Kim, H. Kavak, A. Züfle, T. Anderson, “COVID-19 Ensemble Models Using Representative Clustering,” SIGSPATIAL Special, July 2020, Volume 12, Issue 2, pp 33-41

Saturday, October 3, 2020

Vehicle Relocation for Ride-Hailing

Joon-Seok Kim, Dieter Pfoser, Andreas Züfle


Ever increasing traffic and consequential congestion wastes fuel and is a significant contributor to Green House Gas (GHG) emissions. Contributors here include ride-sharing services such as Uber, Lyft, and Didi, with their drivers not only transporting passengers, but also spending a considerable time in traffic searching for new ones. To mitigate their impact, this work proposes a novel algorithm to improve the efficiency the drivers' search for passengers. Our algorithm directs unassigned drivers to locations where new passengers are expected to emerge. We use a non-negative matrix factorization approach to model the time and location of passengers given historical training data. A probabilistic search strategy then guides drivers to nearby locations for which we predict new passengers. To ensure that drivers do not over subscribe to such areas, we randomize destinations and provide each driver with a home location destination when unassigned. An experimental evaluation using real-world data from Manhattan shows that our approach actually reduces the search time of drivers and the wait time of passengers compared to baseline solutions.

Please check out the video that I presented at the 7th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2020)!

Source code:
Additional materials:

J.-S. Kim, D. Pfoser, and A. Züfle, “Vehicle Relocation for Ride-Hailing,” In Proceedings of 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 2020, doi: 10.1109/DSAA49011.2020.00074

Wednesday, September 2, 2020

Won the Challenge on Mobility Intervention for Epidemics

The challenge was designed for participants to compete each others with their own mobility intervention strategies to minimize scores defined by the organizers. Four mobility interventions are allowed as follows: 
  • To confine individuals in their neighborhood.
  • To quarantine individuals in their home.
  • To isolate individuals from others.
  • To hospitalize infected individuals.

It is important to understand that each intervention has different cost and efficiency. The score is the weighted sum of two exponential functions consisting of two dimensions: the number of infections and the number of interventions. It can be seen as an optimization problem with two opposite objectives. If the number of infections increases, the score increase exponentially. Also, if we intervene more and more, the score increases exponentially as well. That is, it is required to find a balanced strategy. The following video is my presentation introducing our solution at the workshop.

In the challenge, our solution is second ranked among all participants compliant with the challenge documents. The first two teams used the depreciated API that provides presymptomatic information and does not require contact tracing.

Due to complexity of social phenomena, it is a big challenge to predict the curves of epidemics that spread via social contacts and to control such epidemics. Misguided policies to mitigate epidemics may result in catastrophic consequences such as financial crisis, massive unemployment, and the surge of the number of critically ill patients exceeding the capacity of hospitals. In particular, under/overestimation of efficacy of interventions can mislead policymakers about perception of evolving situations. To avoid such pitfalls, we propose Expert-in-the-Loop (EITL) prescriptive analytics using mobility intervention for epidemics. Rather than employing a purely data-driven approach, the key advantage of our approach is to leverage experts' best knowledge in estimating disease spreading and the efficacy of interventions which allows us to efficiently narrow down factors and the scope of combinatorial possible worlds. We introduce our experience to develop Expert-in-the-Loop simulations during the Challenge on Mobility Intervention for Epidemics. We demonstrate that misconceptions about the causality can be corrected in the iterations of consulting with experts, developing simulations, and experimentation.

J.-S. Kim, H. Jin, and A. Züfle, “Expert-in-the-Loop Prescriptive Analytics using Mobility Intervention for Epidemics,” 1st ACM SIGKDD International Workshop on Prescriptive Analytics for the Physical World (PAPW 2020), August 2020

Thursday, July 9, 2020

Semantically Diverse Path Search

The Best Paper Award Runner-Up at IEEE MDM 2020 were awarded to the authors of "Semantically Diverse Path Search"! 

Location-Based Services are often used to find proximal Points of Interest (PoI) – e.g., nearby restaurants and museums, police stations, hospitals, etc. – in a plethora of applications. An important recently addressed variant of the problem not only considers the distance/proximity aspect, but also desires semantically diverse locations in the answer-set. For instance, rather than picking several close-by attractions with similar features – e.g., restaurants with similar menus; museums with similar art exhibitions – a tourist may be more interested in a result set that could potentially provide more diverse types of experiences, for as long as they are within an acceptable distance from a given (current) location. Towards that goal, in this work we propose a novel approach to efficiently retrieve a path that will maximize the semantic diversity of the visited PoIs that are within distance limits along a given road network. We introduce a novel indexing structure – the Diversity Aggregated R-tree, based on which we devise efficient algorithms to generate the answer-set – i.e., the recommended locations among a set of given PoIs – relying on a greedy search strategy. Our experimental evaluations conducted on real datasets demonstrate the benefits of proposed methodology over the baseline alternative approaches.

The nice presentation was given by Xu Teng, Iowa State University, at the IEEE MDM 2020.

X. Teng, G. Trajcevski, J.-S. Kim, and A. Züfle, “Semantically Diverse Path Search,” In Proceedings of IEEE International Conference on Mobile Data Management (MDM 2020), July 2020, pp. 69-78

Managing Uncertainty in Evolving Geo-Spatial Data

Andreas Züfle, Goce Trajcevski, Dieter Pfoser, Joon-Seok Kim

Our ability to extract knowledge from evolving spatial phenomena and make it actionable is often impaired by unreliable, erroneous, obsolete, imprecise, sparse, and noisy data. Integrating the impact of this uncertainty is a paramount when estimating the reliability/confidence of any time-varying query result from the underlying input data. The goal of this advanced seminar is to survey solutions for managing, querying and mining uncertain spatial and spatio-temporal data. We survey different models and show examples of how to efficiently enrich query results with reliability information. We discuss both analytical solutions as well as approximate solutions based on geosimulation.

The Advanced Seminar of IEEE MDM 2020 was featured with four parts as follows:

  • Part I: Introduction and Motivation
  • Part II: Uncertainty in Spatial Data
    1. Uncertainty Models and Possible World Semantics
    2. Representative Query Processing using Monte-Carlo Sampling
  • Part III: Uncertainty in Evolving Spatial Data
    1. Sources, Models and Contexts
    2. Non-point Evolving Entities
  • Part IV: Geospatial Simulation 

Among them, I share the video of "Part IV: Geospatial Simulation" that I presented at the conference.

The whole video for the advanced seminar can be found here.

A. Züfle, G. Trajcevski, D. Pfoser, and J.-S. Kim, “Managing Uncertainty in Evolving Geo-Spatial Data,” In Proceedings of IEEE International Conference on Mobile Data Management (MDM 2020), July 2020, pp. 5-8

Friday, July 3, 2020

Location-Based Social Network Data Generation Based on Patterns of Life

Joon-Seok Kim, Hyunjee Jin, Hamdi Kavak, Ovi Chris Rouly, Andrew Crooks, Dieter Pfoser, Carola Wenk, Andreas Züfle


Location-based social networks (LBSNs) have been studied extensively in recent years. However, utilizing real-world LBSN data sets yields several weaknesses: sparse and small data sets, privacy concerns, and a lack of authoritative ground-truth. To overcome these weaknesses, we leverage a large-scale LBSN simulation to create a framework to simulate human behavior and to create synthetic but realistic LBSN data based on human patterns of life. Such data not only captures the location of users over time but also their interactions via social networks. Patterns of life are simulated by giving agents (i.e., people) an array of "needs" that they aim to satisfy, e.g., agents go home when they are tired, to restaurants when they are hungry, to work to cover their financial needs, and to recreational sites to meet friends and satisfy their social needs. While existing real-world LBSN data sets are trivially small, the proposed framework provides a source for massive LBSN benchmark data that closely mimics the real-world. As such, it allows us to capture 100% of the (simulated) population without any data uncertainty, privacy-related concerns, or incompleteness. It allows researchers to see the (simulated) world through the lens of an omniscient entity having perfect data. Our framework is made available to the community. In addition, we provide a series of simulated benchmark LBSN data sets using different synthetic towns and real-world urban environments obtained from OpenStreetMap. The simulation software and data sets, which comprise gigabytes of spatio-temporal and temporal social network data, are made available to the research community.

Please check out the video that I presented at the 21st IEEE International Conference on Mobile Data Management (MDM 2020)!

Source code:
LSBN Data:
Additional materials:

J.-S. Kim, H. Jin, H. Kavak, O. Rouly, A. Crooks, D. Pfoser, C. Wenk, and A. Züfle, “Location-Based Social Network Data Generation Based on Patterns of Life,” In Proceedings of IEEE International Conference on Mobile Data Management (MDM 2020), July 2020, pp. 158-167