It's AI Match: A Two-Step Approach for Schema Matching Using Embeddings

03/08/2022
by   Benjamin Hättasch, et al.
6

Since data is often stored in different sources, it needs to be integrated to gather a global view that is required in order to create value and derive knowledge from it. A critical step in data integration is schema matching which aims to find semantic correspondences between elements of two schemata. In order to reduce the manual effort involved in schema matching, many solutions for the automatic determination of schema correspondences have already been developed. In this paper, we propose a novel end-to-end approach for schema matching based on neural embeddings. The main idea is to use a two-step approach consisting of a table matching step followed by an attribute matching step. In both steps we use embeddings on different levels either representing the whole table or single attributes. Our results show that our approach is able to determine correspondences in a robust and reliable way and compared to traditional schema matching approaches can find non-trivial correspondences.

READ FULL TEXT
research
10/15/2020

Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Data is the king in the age of AI. However data integration is often a l...
research
09/15/2021

PoWareMatch: a Quality-aware Deep Learning Approach to Improve Human Schema Matching

Schema matching is a core task of any data integration process. Being in...
research
10/14/2020

Valentine: Evaluating Matching Techniques for Dataset Discovery

Data scientists today search large data lakes to discover and integrate ...
research
12/02/2020

Learning to Characterize Matching Experts

Matching is a task at the heart of any data integration process, aimed a...
research
07/10/2014

XML Matchers: approaches and challenges

Schema Matching, i.e. the process of discovering semantic correspondence...
research
08/03/2023

LOUC: Leave-One-Out-Calibration Measure for Analyzing Human Matcher Performance

Schema matching is a core data integration task, focusing on identifying...
research
05/11/2023

A Semi-Automated Hybrid Schema Matching Framework for Vegetation Data Integration

Integrating disparate and distributed vegetation data is critical for co...

Please sign up or login with your details

Forgot password? Click here to reset