Putting Self-Supervised Token Embedding on the Tables

07/28/2017
by   Marc Szafraniec, et al.
0

Information distribution by electronic messages is a privileged means of transmission for many businesses and individuals, often under the form of plain-text tables. As their number grows, it becomes necessary to use an algorithm to extract text and numbers instead of a human. Usual methods are focused on regular expressions or on a strict structure in the data, but are not efficient when we have many variations, fuzzy structure or implicit labels. In this paper we introduce SC2T, a totally self-supervised model for constructing vector representations of tokens in semi-structured messages by using characters and context levels that address these issues. It can then be used for an unsupervised labeling of tokens, or be the basis for a semi-supervised information extraction system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2023

Semi-supervised learning made simple with self-supervised clustering

Self-supervised learning models have been shown to learn rich visual rep...
research
10/27/2022

Evaluating context-invariance in unsupervised speech representations

Unsupervised speech representations have taken off, with benchmarks (SUP...
research
11/22/2021

Self-supervised Semi-supervised Learning for Data Labeling and Quality Evaluation

As the adoption of deep learning techniques in industrial applications g...
research
06/06/2023

DenseDINO: Boosting Dense Self-Supervised Learning with Token-Based Point-Level Consistency

In this paper, we propose a simple yet effective transformer framework f...
research
10/15/2020

Learning Better Representation for Tables by Self-Supervised Tasks

Table-to-text generation aims at automatically generating natural text t...
research
05/13/2020

INFOTABS: Inference on Tables as Semi-structured Data

In this paper, we observe that semi-structured tabulated text is ubiquit...

Please sign up or login with your details

Forgot password? Click here to reset