Inflo: News Categorization and Keyphrase Extraction for Implementation in an Aggregation System

12/10/2018 ∙ by Pranav A, et al. ∙ 0

The work herein describes a system for automatic news category and keyphrase labeling, presented in the context of our motivation to improve the speed at which a user can find relevant and interesting content within an aggregation platform. A set of 12 discrete categories were applied to over 500,000 news articles for training a neural network, to be used to facilitate the more in-depth task of extracting the most significant keyphrases. The latter was done using three methods: statistical, graphical and numerical, using the pre-identified category label to improve relevance of extracted phrases. The results are presented in a demo in which the articles are pre-populated via News API, and upon being selected, the category and keyphrase labels will be computed via the methods explained herein.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Discussions and Further Steps

The Inflo labeling system can be used to improve the efficiency of finding relevant content within an aggregation platform. Because classification is instantaneous and automatic, news articles shared from any source can be analyzed and labelled, making them easily accessible through the use of content navigation tools such as topic and sub-topic filters. Such filters could allow for a more personalized feed of relevant content, rather than a single stream of potentially irrelevant or uninteresting content. With the keyphrase extraction, news articles which are directly related (i.e. articles from a different source on a singular event / incident) could be clustered and presented together, making it possible to expand one’s perspective and take-in different viewpoints. Overall, the Inflo news labeling system, with its highly accurate output of insightful terms could expedite the process of finding relevant and interesting content on the web when implemented in an aggregation platform.