DeepAI AI Chat
Log In Sign Up

Auctus: A Dataset Search Engine for Data Augmentation

by   Fernando Chirigati, et al.

Machine Learning models are increasingly being adopted in many applications. The quality of these models critically depends on the input data on which they are trained, and by augmenting their input data with external data, we have the opportunity to create better models. However, the massive number of datasets available on the Web makes it challenging to find data suitable for augmentation. In this demo, we present our ongoing efforts to develop a dataset search engine tailored for data augmentation. Our prototype, named Auctus, automatically discovers datasets on the Web and, different from existing dataset search engines, infers consistent metadata for indexing and supports join and union search queries. Auctus is already being used in a real deployment environment to improve the performance of ML models. The demonstration will include various real-world data augmentation examples and visitors will be able to interact with the system.


page 1

page 2

page 3

page 4


ARDA: Automatic Relational Data Augmentation for Machine Learning

Automatic machine learning () is a family of techniques to automate the ...

Augmentation Backdoors

Data augmentation is used extensively to improve model generalisation. H...

Automatic Data Augmentation via Invariance-Constrained Learning

Underlying data structures, such as symmetries or invariances to transfo...

UniformAugment: A Search-free Probabilistic Data Augmentation Approach

Augmenting training datasets has been shown to improve the learning effe...

Experimenting with an Evaluation Framework for Imbalanced Data Learning (EFIDL)

Introduction Data imbalance is one of the crucial issues in big data ana...

LidarAugment: Searching for Scalable 3D LiDAR Data Augmentations

Data augmentations are important in training high-performance 3D object ...

On the Social and Technical Challenges of Web Search Autosuggestion Moderation

Past research shows that users benefit from systems that support them in...