The pandemic of the SARS-CoV-2, which causes COVID-19 outbreaks, has a significant impact globally, especially on human life and economic activities. As resources are limited, current policies are having difficulty in identifying and quarantining asymptomatic virus carriers. As a result, it is much harder to control the spread of the virus. To prevent further spread of COVID-19, immediate action is needed. Contact tracing is a method that helps patients recall with whom or where they have been. Identifying contacts and ensuring they do not have a chance to interact with others is critical to slow down the pandemic NCIRD (2019).
This paper is the first in which an approach with continuous learning capabilities is used to analyze the probability of asymptomatic carriers of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To this end, we compute a ranking model with city GPS spatial dynamics data Tang et al. (2018). The approach is a framework for finding and ranking the source of infection among a moving crowd and can be easily applied to the dynamic modelling of the spreading of the SARS-CoV-2 virus. It is highly efficient in calculating the rich interactive features with continuous data to approximate the individual probability of being infected since the (Monte Carlo tree search) MCTS on IDG reduces the time to search the important center-surround features. The infection probability of each person exposed in a crowd over time can be quickly obtained by the CLIIP. Moreover, even a superspreader (active in motion, high viral titer, asymptomatic) can be found when we use backward and forward tracking at the same time. The backward tracking  is backward finding of a day when one possibly got infected and the forward tracking  is going through the whole day of inference detection on possible days.
2 Related Work
Contact tracing is currently the most common way for public health institutions to track infected people and the sources of the virus Keeling. (2003); Scutchfield and Keck (2003). This method can locate infected individuals and minimize the spread of the virus by isolating them and their contacts at risk of infection from the public. In past decades, it has been not only used for controlling diseases but also a critical tool for investigating new diseases or unusual outbreaks; for example, SARS and H1N1, two of previous pandemics, were suppressed by the help of contact tracing. Governments and health institutes have had or proposed the adoption of contact tracing Zastrow (2020); Kiss et al. (2005); Lalvani et al. (2001) to follow the daily routes of residents to decrease the likelihood of infected people’s contact with healthy people.
Recently, in order to determine the contact paths of infected people more quickly, the method has been advanced from manual recording and tracking people’s mobile phones via Bluetooth, or GPS techniques Cho et al. (2020); Apple.com (2020); Ian Sherr (2020); Chan et al. (2020); Wuhan (2020). Moreover, Hellewell et al. Hellewell et al. (2020) used the model to quantify the potential effectiveness of contact tracing and isolation of the confirmed cases in controlling the outbreak of a severe acute respiratory syndrome coronavirus like SARS-CoV-2. Peng et al. Peng et al. (2020) developed the method of a trinary split into red, yellow, or green states to track infectors. Recent contact tracing methods, such as Zhou et al Zhou et al. (2020), use mobile data with regional infection numbers to predict an individual’s possibility to get infected. However, contact tracing cannot identify the probability of asymptomatic carriers and is not always the most efficient method of addressing infectious diseases. Under the current limitation of medical resources, governments can only isolate the people in direct contact with the confirmed cases as the primary way to control the out spreading of the SARS-CoV-2 virus.
As the current speed and capacity of virus testing still cannot meet the demand, the outbreak of the COVID-19 is difficult to be under control. So far, the most feasible way for countries and cities to lesson the spread of infection is to enforce a lockdown or stay-at-home order to stop unnecessary social interactions of residents. However, the longer lockdown or quarantine has been implemented, the greater impact it has on a country’s economy, people’s mental health and many other aspects of their lives. The non-ranking and exhaustive inspection method of contact tracing with only the confirmed cases is not efficient enough to suppress the outbreak of COVID-19 and its recurrence, especially after the re-opening of a city or country. The detection of asymptomatic infected people, along with appropriate social distancing, effective medical treatments, and the development of vaccination, will greatly determine the extent a current or new disease outbreak can be controlled.
As a result, we propose a machine-learning algorithm to predict the spreading of the SARS-CoV-2 virus and reduce the time to locate infected people. We use a gradient boost ensemble learning tree model after the individual state is updated through an IDG to calculate the probability and continuous learning will keep improving the model of the LightGBMKe et al. (2017) algorithm. It can obtain a better result without parameter adjustment. The CLIIP is an innovative approach under combining temporal difference learning which learns by bootstrapping with value function approximation on predicting the probability of getting infected when it comes to real circumstances. To continuously measure the real-world physical activity on machine intelligence, the approximation of the value and the professional inference is essential to taking care of and our approach bridge the gap between theory and reality.
We develop a framework with the inference model, which is a more efficient and precise method to narrow down the search for potential asymptomatic infected people. It can potentially deduce the source of infection based on virus infection spreading pathway and contact tracing process. People’s infection paths and their probabilities of infection depend on several critical factors like duration, frequency, and distance of their contact with any infected individual. These factors determine the state of the population infection over time. The continuous learning model based on these phenomena can be used to simulate and analyze someone’s probability of the infection status.
Definition of input 1:
There are people, . Then, assume we have m key interaction features of each person at time to describe people’s connection, .
In this paper, we use their location and timestamp as their key interaction features.
Definition of input 2: With each time unit, everyone has a label to indicate the state. , where is the number of kinds of people’s infected states at time . We use seven kinds of states, which are susceptible S, susceptible_and_ quarantined Sq, exposed E, exposed_and_ quarantine Eq, infected I, hospitalized H, and recovered R. There is some dependency between these states of a SEIR model Younsi et al. (2015).
The system aims to give out the ranking by order of priority of infection, as described in Fig. 2. We start from the people’s interaction features over time as an input to the framework. The interaction data is filtered out by standard spatial data with more accuracy through map-matching work from Newson et al. work Newson and Krumm (2009), or by combining it with other data like credit card transaction data or check-in data as Limited (2020). By reconnecting the path for all people, it becomes the social interaction network in the form of an IDG that we use for further research. To build up the interaction data as an IDG, we extract the key interaction features describing the dynamic behavior of each person (Fig. 2 step (1)) from continuous spatial data, which we can extract the frequency and distance of people’s contacts. Another input comes from the SEIR model describing people’s state updated each time , like "infected" or "recovered".
3.2 Combination of SEIR model and interaction data
To prove the effectiveness of the model, we use the dynamic spatial GPS data of a crowd in the city and convert it to approximate the interaction data for 30 consecutive days as input 1 from City GPS spatial data Tang et al. (2018) and Table. 1. We calculate the spreading of the virus in the city using the agent-based simulation of the improved SEIR model for SARS-CoV-2 as input 2 to prepare the infected environment.
The SEIR model (Definition of Input 2 Fig. 2) Younsi et al. (2015) refers to the flows of people between four states: S holds susceptible people, E contains exposed people incubating the disease, (and possibly some that are infectious, however, the numbers of infected people are insufficient for the confirmed infected) I holds for confirmed infected people, and R for recovered people. There are the states, Susceptible quarantined Sq, Exposed quarantined Eq, and Hospitalized H, are taken into consideration as Fig. 3.
With key interaction features from input 1, we generate an IDG at an updated time (Fig. 2 step (2)), which is a directed acyclic graph used as a people’s connection model. We treat each node in the IDG as a person, and each directed edge as a spreading relation between two people who stayed at the same location for a while for certain time. The direction of the edge means the infection source-destination, which is defined so that and the arrow points to the person who left a place later since he/she is more likely to be infected by the other who left earlier. With input 2, we label people’s states in the IDG, and update the previous IDG in incubation period Makar et al. (2018) at the same time (Fig. 2 step (3)). When getting an updated IDG in the period and , we compute the probability and ranking of each person, including S and E. Using the IDG (Fig. 2 step (4)) and SEIR states generate each individual’s status and calculating the features to feed the model. The learning process can enhance the capability to search the asymptomatic carriers. Finally, we update the probability and ranking of each person in the period. We then introduce an algorithm using a very simple yet highly efficient searching strategy for training a lightGBM model with data derived from running the SEIR model and relation graph updating.
3.3 Updating states in the IDG
In the IDG, we label infected people as red nodes, susceptible people S who may be healthy as green nodes, and exposed people who may be infected or virus carriers but not confirmed as yellow nodes. When newly infected people are confirmed from input 2 at time , we use the incubation period distribution Bays et al. (2020) to assign their actual time to become infected through a discrete probability of each day. This gives us a way to update states in the IDG between , with n being the duration of the incubation period. The SEIR model updated every 2 hours between and , following the step in 4.1 below. Therefore we end up with 530 infected environments in 30 days in the city.
After that, we use a continuous learning algorithm from Algorithm. 22 to build the CLIIP model and the LightGBM model by using a set of IDG before time as a training data set and the SEIR state as a label. The relation graph updated each time step in the algorithm by updated IDG to form or change the relationship between nodes. Simultaneously, the SEIR is updated by the next time point. Then the CLIIP approach starts calculating the important individual surrounding features such the contact time of infected people. When we assume the is the person on the path between two infectors, the measurement of importance is by the order of . The process of counting all surrounding features is simple on collecting the training dataset, but it cost too much. Based on the nodes in the graph have their weights and the probability of asymptomatic carriers, we speed up by performing the search based on the Monte Carlo tree search (MCTS) Moore (1959); Dijkstra (1959) method to get the surrounding information of the nodes with no-repeat ID searching.
3.4 Ranking process
If we know the new people who got infected from input 2, we backtrack the route of transmission by using incubation period time distribution to begin searching in the range of days ago. Then we do the forward tracking as shown in Fig. 4. If we find the source of the virus in the first layer, the search will stop, and we will rebuild the all relations of IDG. Then we will start to predict the possibility of people in the order that E goes first and then S. Else if we see it in other layers of the search, we put the people between the path of the first group and add remain E into the second group, and collect S
to the third group. The ordinal numbers of groups are the ranking order on calculating the probability by the LightGBM model. The input interaction features of the model will be [, , , ], the annotation between (3) and (4) in Fig. 2. and is the duration and closest distance between two IDs inside the data. The other interaction features and stand for several infected people, and exposed people around them. The label Y is the state generated from the SEIR model. The use of this interaction feature is motivated by the inference logic that a virus must come from the people around an infected person. And the output is labeled from the SEIR model simulation. Following Fig. 4 is a computed example for the result.
Finally, for each person at a specific time, we not only having the infected states pointing out who got infected or exposed but also the probability of asymptomatic carriers.
4 Implementation of individualized SEIR model
4.1 SEIR model updating steps
The epidemic data used in this paper comes from Shi et al. (2020). On condition of the limit data we assume, there are 100 infected people in the group. Then the other states are the same ratio as in Shi et al. (2020), except for S; that is the number of ID recorded in the data set being assigned to other states. We initialize parameters: ; ; ; ; ; ; . As E people we use the possible list being in contact with the initial infected people I. To extend the distribution of SEIR model into the individual scale , we follow the steps below. The update process should base on the real interaction data as IDG.
Load new model in next time step
If (member of > member of ) get from to
If (member of > member of ) get from to
If (member of > member of ) get from generated by with relation built by certain and .
If (member of > member of ) get from
If (member of > member of ) get from and the probability of choice depends on individual incubation day of .
If (member of > member of ) get from and by random.
If (member of > member of ) get from and by random. (This should improve by depending on a curved day) The IDG from Section 3 is the foundation to build the inside of the update state.
4.2 Simulation and model building
To simulate the situation more realistically, we made some arrangements regarding the initial individuals. First, we randomly sorted people into the Eq, Sq group. For state I, we split 100 initial values into two groups; one group was chosen randomly, the other group was selected depending on the first and second layers of the first group of people. This process could yield the primary connection between the first group of infected people. Then the state E people will be picked from a group of connection to the state I. Then the rest of the people will become S. Then we applied the update rule to the last section. From here, we could get people’s state as input, for which now we merely considered the interactive time, interactive distance, first, second, and third layers of infected people, and exposed numbers. Moreover, we attributed label in a specific state to the IDG. This will be updated by future data.
With this infected environment model, we create a perfect fit in the individual SEIR model. First, we used the incubation period time distribution to begin searching in the range of the previous 5-7 days. Then we continued the finding process until infected being found and start ranking the people on the path. In the real world, things become more complex as the virus could spread out not only from people’s contacting, thus there will be more missing nodes on disease spread map which we will need to consider more layers on this condition. However, we can claim that the method can locate about 96% (Table. 2 average AUC of our model) asymptomatic people in the group of people if we have all their surrounding label records and transfer data. One of the result of crowd ranking visualization shows the ranking distributed from susceptible people to infected people and indicates the probability (darker means high probability) of asymptomatic virus carriers in Fig. 5.
We estimate the average precision of the model and by seeing the interaction feature importance in Tab.2, we claim that the model found the rules inside of the SEIR model such as the transmission rule.
As resources are limited in the real world, there should be some priority in ranking the crowd. As in Fig. 1, the first level of people exposed to the infector has higher priority than the second level and so on. Fig. 6 shows the correlation between the CLIIP model and the baseline of contact tracing. We use a different group of samples to demonstrate the results. The base unit of a ratio is 500 people, so the blue line shows the 1000 people group, and there are 500 already recognized as infected people. In contrast, the blue dash line shows the primary contact tracing performance that compares to the method. Generally, it needs to search until the end to make sure there is no missing person out there.
Moreover, we use this model to test a larger group of people with more healthy people in the test group. The CLIIP model can cover most infected people when checking the same number of people because we make the ranking order, which speeds up the testing. We plot more than thousands of points to address the result. Furthermore, the baseline we compare is the average performance of contact tracing. Thus the CLIIP model can find infected people more precisely and decrease the required social and medical resources.
We propose a novel interaction-based inference learning approach and the major advantage lies in calculating the individual probability of getting infected from interactions along a timeline. In addition, the learning algorithm allows us to employ multi-modal datasets and interactive features such as weather, subjective feelings of individuals, wearing masks Kai et al. (2020), hand washing, and other health-related factors. This could further increase the accuracy in calculating and ranking the infection probability. Our approach can be further applied to more real world scenarios:
Precisely identifying and predicting the most likely virus carriers
Ranking the probability of potential asymptomatic carriers of the crowd by our approach helps with precisely controlling the spread of the SARS-CoV-2 virus. This approach simulates very well under the condition of sufficient spatial mobile data during citywide outbreaks. Healthcare officials can develop a more precise control or quarantine strategy toward the aﬀected regions, areas, or individuals than a citywide lockdown. Furthermore, an adaptive and flexible "exit" strategy can also facilitate the re-opening and maintain normal economic activities with a limited quarantine.
Searching for superspreaders The disease spreading map in our IDG makes the ranking of superspreaders possible. Following the state of contacting people, the superspreaders are most likely to be in the path between two infectors. Using our approach to analyze individuals of the surrounding layer of the spreader, the possibility of being a superspreader can be described as the equation below:
which guides the search for superspreader and creates more learning samples for further finding action. This enhances the learning precision and accelerates the inference process significantly.
Decision support for saving resources
The approach can simulate the situation after executing the policy. The individual model of the virus spread could give suggestions on, such as:
Disinfection, sterilization, and preservation.
Based on the distribution of the spread probability of outdoor areas, indoor simulation is also possible. Combining surveillance camera data with the CLIIP to give the infected index of each contacting region, the precise disinfection of, for instance, elevator buttons in the area is possible, for example, when a threshold of the accumulative possibility of being touched by high-risk people is reached.
Optimal testing times.
With new testing methods like nucleic acid tests, PCR based tests, antigen tests, and serology tests, we could add the features of fail testing probability and recalculating the individual infection probability to our approach. Considering all individuals in society based on our approach, it is possible to calculate the R0, a mathematical term that indicates how contagious an infectious disease is. However, we need to rebuild the model of the CLIIP and make labels like R0 to train the new model. Decision makers can refer to the R0 to obtain the infection degree of an area and thus decide on the testing times and methods.
To counter the reinfection of SARS-CoV-2, the CLIIP can reuse the data from the first infection model to predict the probability of reinfection for the rest of the people. Although the source of exogenous people is unclear in the transmission route, especially regarding the latency of SARS-Cov-2, our approach is able to observe each person in society to calculate the individual probability of reinfection.
Beyond the virus spreading, our approach can be applied to modelling, learning and inference on the individual level of general latent influence networks, such as in P2P e-commerce, searching for terrorists, predicting risks of digital security and so on. In social networks, for example, people send diverse comments to each other, influencing the others via their mood, intent, and thus generating the individualized relation graph. By measuring people’s center-surrounded commented mood/intent and by continuous learning, their decision-making policy on issues, such as purchasing behaviours, finding terrorism and preventing digital virus spreading, and so on, can be gradually modelled and their future actions can be precisely predicted.
- Contact tracing – bluetooth specification. apple.com. External Links: Cited by: §2, §6.
- Applying probability-weighted incubation period distributions to traditional wind rose methodology to improve public health investigations of legionnaires’ disease outbreaks. Epidemiology and Infection 148, pp. e33. Cited by: §3.3, §6.
- PACT: privacy sensitive protocols and mechanisms for mobile contact tracing. Cited by: §2, §6.
- Contact tracing mobile apps for covid-19: privacy considerations and related trade-offs. arXiv preprint arXiv:2003.11511. Cited by: §2, §6.
- A note on two problems in connexion with graphs. Numerische Mathematik. Cited by: §3.3, §6.
- Quantifying sars-cov-2 transmission suggests epidemic control with digital contact tracing. Science. External Links: Cited by: §6.
- Feasibility of controlling covid-19 outbreaks by isolation of cases and contacts. The Lancet Global Health. Cited by: §2, §6.
- Apple and google are building coronavirus tracking tech into ios and android – the two companies are working together, representing most of the phones used around the world. CNET. External Links: Cited by: §2, §6.
- Universal masking is urgent in the covid-19 pandemic: seir and agent based models, empirical validation, policy recommendations. Cited by: §6.
Lightgbm: a highly efficient gradient boosting decision tree. In Advances in neural information processing systems, pp. 3146–3154. Cited by: §2, §6.
- Contact tracing and disease control. Proceedings. Biological sciences 270 (1533), pp. . Cited by: §2, §6.
- Disease contact tracing in random and clustered networks. Proceedings of the Royal Society B: Biological Sciences 272 (1570), pp. 1407–1414. Cited by: §2, §6.
- Enhanced contact tracing and spatial tracking of mycobacterium tuberculosis infection by enumeration of antigen-specific t cells. The Lancet 357 (9273), pp. 2017 – 2021. External Links: Cited by: §2, §6.
- CHECKIN-19 touchless guest register. Center of Disease Control and Prevention. External Links: Cited by: §3.1.
Learning the probability of activation in the presence of latent spreaders.
Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §3.2, §6.
- The shortest path through a maze. Bell Telephone System. Technical publications. monograph, Bell Telephone System.. Cited by: §3.3, §6.
- Coronavirus disease 2019 (covid-19). Center of Disease Control and Prevention. External Links: Cited by: §1, §6.
- Hidden markov map matching through noise and sparseness. In Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems, pp. 336–343. Cited by: §3.1, Table 1, §6.
- Lessons to europe from china for cancer treatment during the covid-19 pandemic. British journal of cancer, pp. 1–2. Cited by: §2, §6.
- Estimating infectious disease transmission distances using the overall distribution of cases. Epidemics 17, pp. 10 – 18. Cited by: Figure 2, §6.
- Principles of public health practice. Delmar Learning. Cited by: §2, §6.
- SEIR transmission dynamics model of 2019 ncov coronavirus with considering the weak infectious ability and changes in latency duration. medRxiv. Cited by: Figure 3, §4.1, §6.
- Visual analysis of traffic data based on topic modeling (chinavis 2017). Journal of Visualization 21 (4), pp. 661–680. Cited by: §1, §3.2, §6.
- Aerosol and surface stability of sars-cov-2 as compared with sars-cov-1. New England Journal of Medicine 382 (16), pp. 1564–1567. Cited by: Figure 1, §6.
- Inside china’s smartphone ’health code’ system ruling post-coronavirus life. TIME Magazine. External Links: Cited by: §2, §6.
- SEIR-sw, simulation model of influenza spread based on the small world network. Tsinghua Science and Technology 20 (5), pp. 460–473. Cited by: 2nd item, §3.2, §6.
- South korea is reporting intimate details of covid-19 cases: has it helped?. Nature. Cited by: §2, §6.
- Detecting suspected epidemic cases using trajectory big data. arXiv preprint arXiv:2004.00908. Cited by: §2, §6.