"hasSignification()": une nouvelle fonction de distance pour soutenir la détection de données personnelles

06/14/2022
by   Amine Mrabet, et al.
0

Today with Big Data and data lakes, we are faced of a mass of data that is very difficult to manage it manually. The protection of personal data in this context requires an automatic analysis for data discovery. Storing the names of attributes already analyzed in a knowledge base could optimize this automatic discovery. To have a better knowledge base, we should not store any attributes whose name does not make sense. In this article, to check if the name of an attribute has a meaning, we propose a solution that calculate the distances between this name and the words in a dictionary. Our studies on the distance functions like N-Gram, Jaro-Winkler and Levenshtein show limits to set an acceptance threshold for an attribute in the knowledge base. In order to overcome these limitations, our solution aims to strengthen the score calculation by using an exponential function based on the longest sequence. In addition, a double scan in dictionary is also proposed in order to process the attributes which have a compound name.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2018

Query Understanding via Entity Attribute Identification

Understanding searchers' queries is an essential component of semantic s...
research
02/05/2016

Automatic and Quantitative evaluation of attribute discovery methods

Many automatic attribute discovery methods have been developed to extrac...
research
08/31/2022

A topic-aware graph neural network model for knowledge base updating

The open domain knowledge base is very important. It is usually extracte...
research
11/09/2022

Evident: a Development Methodology and a Knowledge Base Topology for Data Mining, Machine Learning and General Knowledge Management

Software has been developed for knowledge discovery, prediction and mana...
research
05/07/2023

Score: A Rule Engine for the Scone Knowledge Base System

We present Score, a rule engine designed and implemented for the Scone k...
research
02/21/2016

Determining the best attributes for surveillance video keywords generation

Automatic video keyword generation is one of the key ingredients in redu...
research
03/12/2018

Entity-Aware Language Model as an Unsupervised Reranker

In language modeling, it is difficult to incorporate entity relationship...

Please sign up or login with your details

Forgot password? Click here to reset