Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications

04/29/2019
by   Mark-Christoph Müller, et al.
0

We present a very simple, unsupervised method for the pairwise matching of documents from heterogeneous collections. We demonstrate our method with the Concept-Project matching task, which is a binary classification task involving pairs of documents from heterogeneous collections. Although our method only employs standard resources without any domain- or task-specific modifications, it clearly outperforms the more complex system of the original authors. In addition, our method is transparent, because it provides explicit information about how a similarity score was computed, and efficient, because it is based on the aggregation of (pre-computable) word-level similarities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2019

Fast, Small, and Simple Document Listing on Repetitive Text Collections

Document listing on string collections is the task of finding all docume...
research
09/03/2019

Finding Salient Context based on Semantic Matching for Relevance Ranking

In this paper, we propose a salient-context based semantic matching meth...
research
02/10/2021

ELSKE: Efficient Large-Scale Keyphrase Extraction

Keyphrase extraction methods can provide insights into large collections...
research
07/02/2018

Transparent, Efficient, and Robust Word Embedding Access with WOMBAT

We present WOMBAT, a Python tool which supports NLP practitioners in acc...
research
04/14/2017

Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

Concept maps can be used to concisely represent important information an...
research
03/06/2022

Optimizing Change Detection in Distributed Digital Collections: An Architectural Perspective of Change Detection

Digital documents are likely to have problems associated with the persis...
research
11/07/2018

Construction and Quality Evaluation of Heterogeneous Hierarchical Topic Models

In our work, we propose to represent HTM as a set of flat models, or lay...

Please sign up or login with your details

Forgot password? Click here to reset