Towards Better Semantic Understanding of Mobile Interfaces

10/06/2022
by   Srinivas Sunkara, et al.
0

Improving the accessibility and automation capabilities of mobile devices can have a significant positive impact on the daily lives of countless users. To stimulate research in this direction, we release a human-annotated dataset with approximately 500k unique annotations aimed at increasing the understanding of the functionality of UI elements. This dataset augments images and view hierarchies from RICO, a large dataset of mobile UIs, with annotations for icons based on their shapes and semantics, and associations between different elements and their corresponding text labels, resulting in a significant increase in the number of UI elements and the categories assigned to them. We also release models using image-only and multimodal inputs; we experiment with various architectures and study the benefits of using multimodal inputs on the new dataset. Our models demonstrate strong performance on an evaluation set of unseen apps, indicating their generalizability to newer screens. These models, combined with the new dataset, can enable innovative functionalities like referring to UI elements by their labels, improved coverage and better semantics for icons etc., which would go a long way in making UIs more usable for everyone.

READ FULL TEXT

page 3

page 6

research
12/16/2020

MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification

We introduce a new dataset, MELINDA, for Multimodal biomEdicaL experImeN...
research
07/02/2019

Brno Mobile OCR Dataset

We introduce the Brno Mobile OCR Dataset (B-MOD) for document Optical Ch...
research
02/28/2023

Training sound event detection with soft labels from crowdsourced annotations

In this paper, we study the use of soft labels to train a system for sou...
research
06/07/2023

MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Multimodal summarization with multimodal output (MSMO) has emerged as a ...
research
02/28/2023

Audio Retrieval for Multimodal Design Documents: A New Dataset and Algorithms

We consider and propose a new problem of retrieving audio files relevant...
research
07/09/2021

Multimodal Icon Annotation For Mobile Applications

Annotating user interfaces (UIs) that involves localization and classifi...
research
07/26/2021

Image-Based Parking Space Occupancy Classification: Dataset and Baseline

We introduce a new dataset for image-based parking space occupancy class...

Please sign up or login with your details

Forgot password? Click here to reset