Leveraging Organizational Resources to Adapt Models to New Data Modalities

08/23/2020
by   Sahaana Suri, et al.
12

As applications in large organizations evolve, the machine learning (ML) models that power them must adapt the same predictive tasks to newly arising data modalities (e.g., a new video content launch in a social media application requires existing text or image models to extend to video). To solve this problem, organizations typically create ML pipelines from scratch. However, this fails to utilize the domain expertise and data they have cultivated from developing tasks for existing modalities. We demonstrate how organizational resources, in the form of aggregate statistics, knowledge bases, and existing services that operate over related tasks, enable teams to construct a common feature space that connects new and existing data modalities. This allows teams to apply methods for training data curation (e.g., weak supervision and label propagation) and model training (e.g., forms of multi-modal learning) across these different data modalities. We study how this use of organizational resources composes at production scale in over 5 classification tasks at Google, and demonstrate how it reduces the time needed to develop models for new modalities from months to weeks to days.

READ FULL TEXT

page 1

page 2

page 10

research
03/07/2020

Cross-modal Learning for Multi-modal Video Categorization

Multi-modal machine learning (ML) models can process data in multiple mo...
research
08/07/2017

Multimodal Classification for Analysing Social Media

Classification of social media data is an important approach in understa...
research
12/02/2018

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

Labeling training data is one of the most costly bottlenecks in developi...
research
10/31/2019

RankML: a Meta Learning-Based Approach for Pre-Ranking Machine Learning Pipelines

The explosion of digital data has created multiple opportunities for org...
research
08/30/2022

FDB: Fraud Dataset Benchmark

Standardized datasets and benchmarks have spurred innovations in compute...
research
05/16/2022

Heri-Graphs: A Workflow of Creating Datasets for Multi-modal Machine Learning on Graphs of Heritage Values and Attributes with Social Media

Values (why to conserve) and Attributes (what to conserve) are essential...
research
02/16/2021

Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure Dataset Release

Large organizations such as social media companies continually release d...

Please sign up or login with your details

Forgot password? Click here to reset