The impact of the epidemics of coronavirus 2019 (COVID-19) in a globalized world and with more communication tools allows instantaneous communication and in many cases without verification of the source of the information that it shows may have contraventions for society .
For another place, the infoveillance for through the use Twitter (www.twitter.com) can be useful for longitudinal text mining and analysis to allow the analysis of some conditions of the epidemiology in real time as previously described in 2009 by Chew in the H1N1 pandemic 
Meanwhile, public health professionals have a increasingly need to establish a feedback loop and monitor real-time online public response and insights during emergency situations to examine the effectiveness of knowledge translation strategies and adapt future communications and educational campaigns to help the population face this pandemic .
The dissemination of information can strongly influence people’s behavior and alter the effectiveness of countermeasures implemented by governments. In this regard, models to predict the spread of the virus are beginning to monitor the behavioral response of the population with respect to public health interventions and the communication dynamics behind content consumption .
During the last weeks, a big interest about Coronavirus started because of one infection located in Wu-han city in China,the epidemic scale of the recently emerged novel coronavirus in Wuhan, China, has increased rapidly, with cases arising across China and other countries and regions. using a transmission model, it was estimate of 81008 cases and the wuhan city have 21022 (11090-33490) total infections in 1 to 22 January.
In relation to Colombia, the first case was registered in Bogota, Colombia. A girl of 19-year-old who returned to Bogota 26 February from Milan, Italy. The woman was recovering at her place of residence. Before this, the young woman was placed in quarantine at her place of residence, with constant medical supervision, and after approximately 10 days it was confirmed that she had overcome the virus and was no longer infected with covid-19. The mayor pointed out that ”contagion to her relatives was also avoided”. Regarding the incidence of COVID-19, it is estimated that by March 18, 2020 in Colombia there are 93 and 2 died according to the record of the Colombian Secretary of Health.
The objective of this article is to describe the epidemiological impact of COVID-19 on press publications for 7 days before describing the first case of COVID-19 in Bogota, Colombia. With this, it is intended to describe the publications on twitter associated with the signs of the coronavirus with the advance of the pandemic and the persistence of the people of Bogota in this regard. This paper follows the next organization: section 2 explains the methodology for the experiments, section 3 presents results and analysis. Section 4 states the conclusions and section 5 introduces recommendations for studies related.
The present work performs experiments with source data from Twitter with Natural Language Processing and Data Mining(Text Mining) following the next steps:
Gather the relevant terms to search on Twitter
Build the query for Twitter and collect data
Pre-processing data to eliminate words with no relevance(stopwords)
Ii-a Gather Relevant Terms
’produccion_esputo’, ’hipoxemia’, ’fatiga’
Ii-B Build the Query and collect data
The extraction of tweets is through Twitter API, with the next parameters:
date: from 29-12-2019 to 14-03-2020
terms: the words about symptoms in the previous subsection
geolocalization: the capital of Colombia is Bogota(4.6,-74.083333)
radius: around 50 km
Ii-C Preprocessing Data
Change format of datatime to year-month-day
Eliminate alphanumeric symbols
Uppercase to lowercase
Eliminate words with size less or equal than 3
Add some exceptions
The date of user account creation
Tweets per day to analyze the increasing number of posts
Cloud of words to analyze the most frequent terms involved per day
The next graphics present the results of the experiments and answer many questions to understand the phenomenon over the population.
Iii-a What about the veracity of the posts?
Nowadays, many users are posting their ideas using Social Networks and there is no control over the veracity of the information. For this reason, one field related to this is the date of the creation of the accounts, this information is presented in Fig. 1
Analyzing the previous, a concentration of the dates is around 2010, 2011 then the age of this account is greater than 6 years. So, if fake users wants to post false information, usually the age of the account could be less than 1 year.
Iii-B How often people post and where did they start?
Considering the window for this analysis was from 29-12-2019 to 14-03-2020, there was an expectation of recovering posts for every day but people was not posting about it during the previous date of 08-03-2020. The graphic Fig. 2 shows an increasing number of post during the last days.
Iii-C What is the people posting about Covid19 symptoms?
After preprocessing tweets and remove stopwords, the predominant words from 2020-03-08 to 2020-03-14 are: dolor, cabeza, ivanduque, coronavirus, uribi, fiebre, contagio, manos, gripe, evitar, estornudar taken from the cloud of words in Fig.3.
Then then most frequent words introduces topics related symptoms(health), besides the graphic shows interest on politics.
Iii-D How is the progress of covid19 in Colombia?
Finally, the image 4 shows the actual increase of infection in Colombia from the start of March, and there is a natural correlation between the increasing number of post per day and the number of infections.
This preliminar analysis helps to understand what is happening in the population in Bogota and this data can be useful to analyze others aspects, phenomenon from different approaches: Economy, Sociology, etc.
A Text Mining approach helps to visualyze what is happening about symptoms of covid19 in Bogota. The relevance of the topic for the people, the increasing number of post, the most relevant terms for day and how the previous ones are naturally correlated to the number of infected people in Colombia.
API Twitter has a limitation of seven days then if you need to collect data, you must set the range of time. Preprocessing step is necessary because the people posts with any rule on mind.
-  K. Sun, J. Chen, and C. Viboud, “Early epidemiological analysis of the coronavirus disease 2019 outbreak based on crowdsourced data: a population-level observational study,” The Lancet Digital Health, 2020.
-  C. Chew and G. Eysenbach, “Pandemics in the age of twitter: content analysis of tweets during the 2009 h1n1 outbreak,” PloS one, vol. 5, no. 11, 2010.
-  V. K. Jain and S. Kumar, “An effective approach to track levels of influenza-a (h1n1) pandemic in india using twitter,” Procedia Computer Science, vol. 70, pp. 801–807, 2015.
-  J. Shaman, A. Karspeck, W. Yang, J. Tamerius, and M. Lipsitch, “Realtime influenza forecasts during the 2012–2013 season,” Nature communications, vol. 4, no. 1, pp. 1–10, 2013.
-  J. M. Read, J. R. Bridgen, D. A. Cummings, A. Ho, and C. P. Jewell, “Novel coronavirus 2019-ncov: early estimation of epidemiological parameters and epidemic predictions,” medRxiv, 2020.
-  Semana, ““Ya el primer caso de coronavirus en Bogota fue superado”: Claudia Lopez,” library Catalog: www.semana.com. [Online]. Available: https://www.semana.com/nacion/articulo/coronavirus-primercaso-de-covid-19-en-bogota-fue-superado/657012
-  E. Dong, H. Du, and L. Gardner, “An interactive web-based dashboard to track covid-19 in real time,” The Lancet Infectious Diseases, 2020.
-  Y. Dong, X. Mo, Y. Hu, X. Qi, F. Jiang, Z. Jiang, and S. Tong, “Epidemiological characteristics of 2143 pediatric patients with 2019 coronavirus disease in china,” Pediatrics, 2020. [Online]. Available: https://pediatrics.aappublications.org/content/early/2020/03/16/peds.2020-0702
-  V. Jain and J.-M. Yuan, “Systematic review and metaanalysis of predictive symptoms and comorbidities for severe covid-19 infection,” medRxiv, 2020. [Online]. Available: https://www.medrxiv.org/content/early/2020/03/16/2020.03.15.20035360