PubSqueezer: A Text-Mining Web Tool to Transform Unstructured Documents into Structured Data

11/05/2020
by   Alberto Calderone, et al.
0

The amount of scientific papers published every day is daunting and constantly increasing. Keeping up with literature represents a challenge. If one wants to start exploring new topics it is hard to have a big picture without reading lots of articles. Furthermore, as one reads through literature, making mental connections is crucial to ask new questions which might lead to discoveries. In this work, I present a web tool which uses a Text Mining strategy to transform large collections of unstructured biomedical articles into structured data. Generated results give a quick overview on complex topics which can possibly suggest not explicitly reported information. In particular, I show two Data Science analyses. First, I present a literature based rare diseases network build using this tool in the hope that it will help clarify some aspects of these less popular pathologies. Secondly, I show how a literature based analysis conducted with PubSqueezer results allows to describe known facts about SARS-CoV-2. In one sentence, data generated with PubSqueezer make it easy to use scientific literate in any computational analysis such as machine learning, natural language processing etc. Availability: http://www.pubsqueezer.com

READ FULL TEXT

page 2

page 3

page 5

research
05/06/2015

Mining Scientific Papers for Bibliometrics: a (very) Brief Survey of Methods and Tools

The Open Access movement in scientific publishing and search engines lik...
research
06/08/2018

Automatic Identification of Research Fields in Scientific Papers

The TERRE-ISTEX project aims to identify scientific research dealing wit...
research
02/26/2017

PubTree: A Hierarchical Search Tool for the MEDLINE Database

Keeping track of the ever-increasing body of scientific literature is an...
research
04/16/2018

PMC text mining subset in BioC: 2.3 million full text articles and growing

Interest in full text mining biomedical research articles is growing. NC...
research
09/15/2020

MatScIE: An automated tool for the generation of databases of methods and parameters used in the computational materials science literature

The number of published articles in the field of materials science is gr...
research
10/05/2021

Reddit-TUDFE: practical tool to explore Reddit usability in data science and knowledge processing

This contribution argues that Reddit, as a massive, categorized, open-ac...
research
02/08/2023

Reception Reader: Exploring Text Reuse in Early Modern British Publications

The Reception Reader is a web tool for studying text reuse in the Early ...

Please sign up or login with your details

Forgot password? Click here to reset