Parameterizing Kterm Hashing

08/02/2022
by   Dominik Wurzer, et al.
0

Kterm Hashing provides an innovative approach to novelty detection on massive data streams. Previous research focused on maximizing the efficiency of Kterm Hashing and succeeded in scaling First Story Detection to Twitter-size data stream without sacrificing detection accuracy. In this paper, we focus on improving the effectiveness of Kterm Hashing. Traditionally, all kterms are considered as equally important when calculating a document's degree of novelty with respect to the past. We believe that certain kterms are more important than others and hypothesize that uniform kterm weights are sub-optimal for determining novelty in data streams. To validate our hypothesis, we parameterize Kterm Hashing by assigning weights to kterms based on their characteristics. Our experiments apply Kterm Hashing in a First Story Detection setting and reveal that parameterized Kterm Hashing can surpass state-of-the-art detection accuracy and significantly outperform the uniformly weighted approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2016

An Improved System for Sentence-level Novelty Detection in Textual Streams

Novelty detection in news events has long been a difficult problem. A nu...
research
01/24/2019

Note on distance matrix hashing

Hashing algorithm of dynamical set of distances is described. Proposed h...
research
02/18/2015

Cross-Modality Hashing with Partial Correspondence

Learning a hashing function for cross-media search is very desirable due...
research
09/11/2019

How to detect novelty in textual data streams? A comparative study of existing methods

Since datasets with annotation for novelty at the document and/or word l...
research
06/08/2020

Procrustean Orthogonal Sparse Hashing

Hashing is one of the most popular methods for similarity search because...
research
10/10/2018

CRH: A Simple Benchmark Approach to Continuous Hashing

In recent years, the distinctive advancement of handling huge data promo...
research
07/16/2018

A Lyra2 FPGA Implementation for Lyra2REv2-Based Cryptocurrencies

Lyra2REv2 is a hashing algorithm that consists of a chain of individual ...

Please sign up or login with your details

Forgot password? Click here to reset