Novel Keyword Extraction and Language Detection Approaches

09/24/2020
by   Malgorzata Pikies, et al.
0

Fuzzy string matching and language classification are important tools in Natural Language Processing pipelines, this paper provides advances in both areas. We propose a fast novel approach to string tokenisation for fuzzy language matching and experimentally demonstrate an 83.6 processing time with an estimated improvement in recall of 3.1 a 2.6 are subdivided into multiple words, without needing to scan character-to-character. So far there has been little work considering using metadata to enhance language classification algorithms. We provide observational data and find the Accept-Language header is 14 match the classification than the IP Address.

READ FULL TEXT
research
12/13/2022

Categorical Tools for Natural Language Processing

This thesis develops the translation between category theory and computa...
research
10/08/2021

Contrastive String Representation Learning using Synthetic Data

String representation Learning (SRL) is an important task in the field o...
research
01/31/2022

Fuzzy Segmentations of a String

This article discusses a particular case of the data clustering problem,...
research
04/22/2021

Fuzzy Classification of Multi-intent Utterances

Current intent classification approaches assign binary intent class memb...
research
03/31/2020

A Clustering Framework for Lexical Normalization of Roman Urdu

Roman Urdu is an informal form of the Urdu language written in Roman scr...
research
11/08/2020

Scout Algorithm For Fast Substring Matching

Exact substring matching is a common task in many software applications....
research
01/07/2021

Language Detection Engine for Multilingual Texting on Mobile Devices

More than 2 billion mobile users worldwide type in multiple languages in...

Please sign up or login with your details

Forgot password? Click here to reset