SpreadsheetCoder: Formula Prediction from Semi-structured Context

06/26/2021
by   Xinyun Chen, et al.
1

Spreadsheet formula prediction has been an important program synthesis problem with many real-world applications. Previous works typically utilize input-output examples as the specification for spreadsheet formula synthesis, where each input-output pair simulates a separate row in the spreadsheet. However, this formulation does not fully capture the rich context in real-world spreadsheets. First, spreadsheet data entries are organized as tables, thus rows and columns are not necessarily independent from each other. In addition, many spreadsheet tables include headers, which provide high-level descriptions of the cell data. However, previous synthesis approaches do not consider headers as part of the specification. In this work, we present the first approach for synthesizing spreadsheet formulas from tabular context, which includes both headers and semi-structured tabular data. In particular, we propose SpreadsheetCoder, a BERT-based model architecture to represent the tabular context in both row-based and column-based formats. We train our model on a large dataset of spreadsheets, and demonstrate that SpreadsheetCoder achieves top-1 prediction accuracy of 42.51 improvement over baselines that do not employ rich tabular context. Compared to the rule-based system, SpreadsheetCoder assists 82 formulas on Google Sheets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2018

Functional Synthesis via Input-Output Separation

Boolean functional synthesis is the process of constructing a Boolean fu...
research
03/23/2020

Creating Synthetic Datasets via Evolution for Neural Program Synthesis

Program synthesis is the task of automatically generating a program cons...
research
09/26/2019

Human-Centric Program Synthesis

Program synthesis techniques offer significant new capabilities in searc...
research
08/11/2021

Retrieval Interaction Machine for Tabular Data Prediction

Prediction over tabular data is an essential task in many data science a...
research
03/16/2022

Hierarchical Clustering and Matrix Completion for the Reconstruction of World Input-Output Tables

World Input-Output (I/O) matrices provide the networks of within- and cr...
research
09/15/2021

FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining

Tables store rich numerical data, but numerical reasoning over tables is...
research
03/12/2023

Correlation between upstreamness and downstreamness in random global value chains

This paper is concerned with upstreamness and downstreamness of industri...

Please sign up or login with your details

Forgot password? Click here to reset