Novel Keyword Extraction and Language Detection Approaches

by   Malgorzata Pikies, et al.

Fuzzy string matching and language classification are important tools in Natural Language Processing pipelines, this paper provides advances in both areas. We propose a fast novel approach to string tokenisation for fuzzy language matching and experimentally demonstrate an 83.6 processing time with an estimated improvement in recall of 3.1 a 2.6 are subdivided into multiple words, without needing to scan character-to-character. So far there has been little work considering using metadata to enhance language classification algorithms. We provide observational data and find the Accept-Language header is 14 match the classification than the IP Address.



page 4


Contrastive String Representation Learning using Synthetic Data

String representation Learning (SRL) is an important task in the field o...

Fuzzy Segmentations of a String

This article discusses a particular case of the data clustering problem,...

Fuzzy Classification of Multi-intent Utterances

Current intent classification approaches assign binary intent class memb...

A Clustering Framework for Lexical Normalization of Roman Urdu

Roman Urdu is an informal form of the Urdu language written in Roman scr...

Scout Algorithm For Fast Substring Matching

Exact substring matching is a common task in many software applications....

Combining a Context Aware Neural Network with a Denoising Autoencoder for Measuring String Similarities

Measuring similarities between strings is central for many established a...

Language Detection Engine for Multilingual Texting on Mobile Devices

More than 2 billion mobile users worldwide type in multiple languages in...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.