How to improve robustness in Kohonen maps and display additional information in Factorial Analysis: application to text mining

06/25/2015
by   Nicolas Bourgeois, et al.
0

This article is an extended version of a paper presented in the WSOM'2012 conference [1]. We display a combination of factorial projections, SOM algorithm and graph techniques applied to a text mining problem. The corpus contains 8 medieval manuscripts which were used to teach arithmetic techniques to merchants. Among the techniques for Data Analysis, those used for Lexicometry (such as Factorial Analysis) highlight the discrepancies between manuscripts. The reason for this is that they focus on the deviation from the independence between words and manuscripts. Still, we also want to discover and characterize the common vocabulary among the whole corpus. Using the properties of stochastic Kohonen maps, which define neighborhood between inputs in a non-deterministic way, we highlight the words which seem to play a special role in the vocabulary. We call them fickle and use them to improve both Kohonen map robustness and significance of FCA visualization. Finally we use graph algorithmic to exploit this fickleness for classification of words.

READ FULL TEXT

page 12

page 14

research
05/01/2020

An Evaluation of Visualization Methods for Population Statistics Based on Choropleth Maps

We evaluate several augmentations to the choropleth map to convey additi...
research
01/31/2016

WASSUP? LOL : Characterizing Out-of-Vocabulary Words in Twitter

Language in social media is mostly driven by new words and spellings tha...
research
03/19/2020

Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

The problem of out of vocabulary words (OOV) is typical for any speech r...
research
08/04/2022

1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition

This report presents our winner solution to ECCV 2022 challenge on Out-o...
research
06/08/2017

The Algorithmic Inflection of Russian and Generation of Grammatically Correct Text

We present a deterministic algorithm for Russian inflection. This algori...
research
08/09/2022

A Survey on Computing Schematic Network Maps: The Challenge to Interactivity

Schematic maps are in daily use to show the connectivity of subway syste...

Please sign up or login with your details

Forgot password? Click here to reset