Diachronic Text Mining Investigation of Therapeutic Candidates for COVID-19
Diachronic text mining has frequently been applied to long-term linguistic surveys of word meaning and usage shifts over time. In this paper we apply short-term diachronic text mining to a rapidly growing corpus of scientific publications on COVID-19 captured in the CORD-19 dataset in order to identify co-occurrences and analyze the behavior of potential candidate treatments. We used a data set associated with a COVID-19 drug re-purposing study from Oak Ridge National Laboratory. This study identified existing candidate coronavirus treatments, including drugs and approved compounds, which had been analyzed and ranked according to their potential for blocking the ability of the SARS-COV-2 virus to invade human cells. We investigated the occurrence of these candidates in temporal instances of the CORD-19 corpus. We found that at least 25 identified terms occurred in temporal instances of the corpus to the extent that their frequency and contextual dynamics could be evaluated. We identified three classes of behaviors: those where frequency and contextual shifts were small and positively correlated; those where there was no correlation between frequency and contextual changes; and those where there was a negative correlation between frequency and contextual shift. We speculate that the latter two patterns are indicative that a target candidate therapeutics is undergoing active evaluation. The patterns we detected demonstrate the potential benefits of using diachronic text mining techniques with a large dynamic text corpus to track drug-repurposing activities across international clinical and laboratory settings.
READ FULL TEXT