R. Manmatha

research

∙ 07/16/2023

DocTr: Document Transformer for Structured Information Extraction in Documents

We present a new formulation for structured information extraction (SIE)...

0 Haofu Liao, et al. ∙

research

∙ 06/02/2023

DocFormerv2: Local Features for Document Understanding

We propose DocFormerv2, a multi-modal transformer for Visual Document Un...

0 Srikar Appalaraju, et al. ∙

research

∙ 02/14/2023

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

In this work, instead of directly predicting the pixel-level segmentatio...

0 Jiang Liu, et al. ∙

research

∙ 11/15/2022

YORO – Lightweight End to End Visual Grounding

We present YORO - a multi-modal transformer encoder-only architecture fo...

0 Chih-Hui Ho, et al. ∙

research

∙ 08/05/2022

GLASS: Global to Local Attention for Scene-Text Spotting

In recent years, the dominant paradigm for text spotting is to combine t...

0 Roi Ronen, et al. ∙

research

∙ 02/11/2022

Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

Text spotting end-to-end methods have recently gained attention in the l...

12 Yair Kittenplon, et al. ∙

research

∙ 12/23/2021

LaTr: Layout-Aware Transformer for Scene-Text VQA

We propose a novel multimodal architecture for Scene Text Visual Questio...

5 Ali Furkan Biten, et al. ∙

research

∙ 06/22/2021

DocFormer: End-to-End Transformer for Document Understanding

We present DocFormer – a multi-modal transformer based architecture for ...

1 Srikar Appalaraju, et al. ∙

research

∙ 08/20/2020

Document Visual Question Answering Challenge 2020

This paper presents results of Document Visual Question Answering Challe...

2 Minesh Mathew, et al. ∙

research

∙ 07/01/2020

DocVQA: A Dataset for VQA on Document Images

We present a new dataset for Visual Question Answering on document image...

24 Minesh Mathew, et al. ∙

research

∙ 04/30/2020

Improving Semantic Segmentation via Self-Training

Deep learning usually achieves the best results with complete supervisio...

8 Yi Zhu, et al. ∙

research

∙ 04/19/2020

ResNeSt: Split-Attention Networks

While image classification models have recently continued to advance, mo...

16 Hang Zhang, et al. ∙

research

∙ 03/25/2020

SCATTER: Selective Context Attentional Scene Text Recognizer

Scene Text Recognition (STR), the task of recognizing text against compl...

0 Ron Litman, et al. ∙

research

∙ 02/12/2020

Hierarchical Auto-Regressive Model for Image Compression Incorporating Object Saliency and a Deep Perceptual Loss

We propose a new end-to-end trainable model for lossy image compression ...

0 Yash Patel, et al. ∙

research

∙ 08/09/2019

Human Perceptual Evaluations for Image Compression

Recently, there has been much interest in deep learning techniques to do...

0 Yash Patel, et al. ∙

research

∙ 07/18/2019

Deep Perceptual Compression

Several deep learned lossy compression techniques have been proposed in ...

5 Yash Patel, et al. ∙

research

∙ 07/04/2019

Searching for Apparel Products from Images in the Wild

In this age of social media, people often look at what others are wearin...

0 Son Tran, et al. ∙

research

∙ 12/02/2017

Compressed Video Action Recognition

Training robust deep video representations has proven to be much more ch...

0 Chao-Yuan Wu, et al. ∙

research

∙ 06/23/2017

Sampling Matters in Deep Embedding Learning

Deep embeddings answer one simple question: How similar are two images? ...

0 Chao-Yuan Wu, et al. ∙

R. Manmatha

Featured Co-authors

Sign in with Google

Consider DeepAI Pro