C. V. Jawahar

research

∙ 09/04/2023

Understanding Video Scenes through Text: Insights from Text-based Video Question Answering

Researchers have extensively studied the field of vision and language, d...

0 Soumya Jahagirdar, et al. ∙

research

∙ 08/23/2023

Towards Real-Time Analysis of Broadcast Badminton Videos

Analysis of player movements is a crucial subset of sports analysis. Exi...

0 Nitin Nilesh, et al. ∙

research

∙ 07/08/2023

Reading Between the Lanes: Text VideoQA on the Road

Text and signs around roads provide crucial information for drivers, vit...

0 George Tom, et al. ∙

research

∙ 03/05/2023

CueCAn: Cue Driven Contextual Attention For Identifying Missing Traffic Signs on Unconstrained Roads

Unconstrained Asian roads often involve poor infrastructure, affecting o...

0 Varun Gupta, et al. ∙

research

∙ 12/30/2022

A Fine-Grained Vehicle Detection (FGVD) Dataset for Unconstrained Roads

The previous fine-grained datasets mainly focus on classification and ar...

0 Prafful Kumar Khoba, et al. ∙

research

∙ 12/17/2022

Towards Robust Handwritten Text Recognition with On-the-fly User Participation

Long-term OCR services aim to provide high-quality output to their users...

0 Ajoy Mondal, et al. ∙

research

∙ 12/15/2022

Enhancing Indic Handwritten Text Recognition Using Global Semantic Information

Handwritten Text Recognition (HTR) is more interesting and challenging t...

0 Ajoy Mondal, et al. ∙

research

∙ 12/02/2022

Information Retrieval from the Digitized Books

Extracting the relevant information out of a large number of documents i...

0 Riya Gupta, et al. ∙

research

∙ 11/10/2022

Watching the News: Towards VideoQA Models that can Read

Video Question Answering methods focus on commonsense reasoning and visu...

0 Soumya Jahagirdar, et al. ∙

research

∙ 10/29/2022

Unsupervised Audio-Visual Lecture Segmentation

Over the last decade, online lecture videos have become increasingly pop...

0 Darshan Singh S, et al. ∙

research

∙ 10/23/2022

IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes

Autonomous driving and assistance systems rely on annotated data from tr...

0 Shubham Dokania, et al. ∙

research

∙ 10/19/2022

Grounded Video Situation Recognition

Dense video understanding requires answering several questions such as w...

0 Zeeshan Khan, et al. ∙

research

∙ 10/06/2022

Audio-Visual Face Reenactment

This work proposes a novel method to generate realistic talking head vid...

11 Madhav Agarwal, et al. ∙

research

∙ 09/01/2022

Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild

In this work, we address the problem of generating speech from silent li...

0 Sindhu B Hegde, et al. ∙

research

∙ 08/17/2022

Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors

In this paper, we explore an interesting question of what can be obtaine...

0 Sindhu B Hegde, et al. ∙

research

∙ 08/16/2022

TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments

High-quality structured data with rich annotations are critical componen...

0 Shubham Dokania, et al. ∙

research

∙ 07/22/2022

My View is the Best View: Procedure Learning from Egocentric Videos

Procedure learning involves identifying the key-steps and determining th...

0 Siddhant Bansal, et al. ∙

research

∙ 04/18/2022

Detecting, Tracking and Counting Motorcycle Rider Traffic Violations on Unconstrained Roads

In many Asian countries with unconstrained road traffic conditions, driv...

7 Aman Goyal, et al. ∙

research

∙ 01/21/2022

Classroom Slide Narration System

Slide presentations are an effective and efficient tool used by the teac...

0 Jobin K. V., et al. ∙

research

∙ 01/17/2022

Automatic Quantification and Visualization of Street Trees

Assessing the number of street trees is essential for evaluating urban g...

4 Arpit Bahety, et al. ∙

research

∙ 01/10/2022

Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

Scene-text recognition is remarkably better in Latin languages than the ...

1 Sanjana Gunna, et al. ∙

research

∙ 01/10/2022

Transfer Learning for Scene Text Recognition in Indian Languages

Scene text recognition in low-resource Indian languages is challenging b...

9 Sanjana Gunna, et al. ∙

research

∙ 11/13/2021

Visual Understanding of Complex Table Structures from Document Images

Table structure recognition is necessary for a comprehensive understandi...

0 Sachin Raja, et al. ∙

research

∙ 11/10/2021

ICDAR 2021 Competition on Document VisualQuestion Answering

In this report we present results of the ICDAR 2021 edition of the Docum...

0 Ruben Tito, et al. ∙

research

∙ 11/02/2021

Personalized One-Shot Lipreading for an ALS Patient

Lipreading or visually recognizing speech from the mouth movements of a ...

0 Bipasha Sen, et al. ∙

research

∙ 10/23/2021

Multi-Domain Incremental Learning for Semantic Segmentation

Recent efforts in multi-domain learning for semantic segmentation attemp...

0 Prachi Garg, et al. ∙

research

∙ 10/16/2021

Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor

This paper proposes a video editor based on OpenShot with several state-...

0 Anchit Gupta, et al. ∙

research

∙ 09/11/2021

Evaluating Computer Vision Techniques for Urban Mobility on Large-Scale, Unconstrained Roads

Conventional approaches for addressing road safety rely on manual interv...

10 Harish Rithish, et al. ∙

research

∙ 08/06/2021

Efficient and Generic Interactive Segmentation Framework to Correct Mispredictions during Clinical Evaluation of Medical Images

Semantic segmentation of medical images is an essential first step in co...

0 Bhavani Sambaturu, et al. ∙

research

∙ 07/20/2021

More Parameters? No Thanks!

This work studies the long-standing problems of model capacity and negat...

0 Zeeshan Khan, et al. ∙

research

∙ 06/24/2021

Towards Automatic Speech to Sign Language Generation

We aim to solve the highly challenging task of generating continuous sig...

16 Parul Kapoor, et al. ∙

research

∙ 05/04/2021

Canonical Saliency Maps: Decoding Deep Face Models

As Deep Neural Network models for face processing tasks approach human-l...

10 Thrupthi Ann John, et al. ∙

research

∙ 04/26/2021

InfographicVQA

Infographics are documents designed to effectively communicate informati...

12 Minesh Mathew, et al. ∙

research

∙ 03/18/2021

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

Scanned receipts OCR and key information extraction (SROIE) represent th...

0 Zheng Huang, et al. ∙

research

∙ 12/26/2020

Few Shot Learning With No Labels

Few-shot learners aim to recognize new categories given only a small num...

0 Aditya Bharti, et al. ∙

research

∙ 12/20/2020

Visual Speech Enhancement Without A Real Visual Stream

In this work, we re-think the task of speech enhancement in unconstraine...

0 Sindhu B Hegde, et al. ∙

research

∙ 12/10/2020

Exploring Pair-Wise NMT for Indian Languages

In this paper, we address the task of improving pair-wise machine transl...

0 Kartheek Akella, et al. ∙

research

∙ 10/27/2020

Improving Word Recognition using Multiple Hypotheses and Deep Embeddings

We propose a novel scheme for improving the word recognition accuracy us...

0 Siddhant Bansal, et al. ∙

research

∙ 10/09/2020

Table Structure Recognition using Top-Down and Bottom-Up Cues

Tables are information-rich structured objects in document images. While...

0 Sachin Raja, et al. ∙

research

∙ 08/25/2020

Graphical Object Detection in Document Images

Graphical elements: particularly tables and figures contain a visual sum...

0 Ranajit Saha, et al. ∙

research

∙ 08/25/2020

CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images

Localizing page elements/objects such as tables, figures, equations, etc...

0 Madhav Agarwal, et al. ∙

research

∙ 08/23/2020

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

In this work, we investigate the problem of lip-syncing a talking face v...

20 K R Prajwal, et al. ∙

research

∙ 08/20/2020

Document Visual Question Answering Challenge 2020

This paper presents results of Document Visual Question Answering Challe...

2 Minesh Mathew, et al. ∙

research

∙ 08/11/2020

Revisiting Low Resource Status of Indian Languages in Machine Translation

Indian language machine translation performance is hampered due to the l...

0 Jerin Philip, et al. ∙

research

∙ 08/07/2020

Textual Description for Mathematical Equations

Reading of mathematical expression or equation in the document images is...

0 Ajoy Mondal, et al. ∙

research

∙ 08/06/2020

IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents

We introduce a new dataset for graphical object detection in business do...

0 Ajoy Mondal, et al. ∙

research

∙ 07/18/2020

Weakly Supervised Instance Segmentation by Learning Annotation Consistent Instances

Recent approaches for weakly supervised instance segmentations depend on...

1 Aditya Arun, et al. ∙

research

∙ 07/15/2020

A Multilingual Parallel Corpora Collection Effort for Indian Languages

We present sentence aligned parallel corpora across 10 Indian Languages ...

0 Shashank Siripragada, et al. ∙

research

∙ 07/01/2020

DocVQA: A Dataset for VQA on Document Images

We present a new dataset for Visual Question Answering on document image...

24 Minesh Mathew, et al. ∙

research

∙ 07/01/2020

Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval

Recognition and retrieval of textual content from the large document col...

20 Siddhant Bansal, et al. ∙

C. V. Jawahar

Featured Co-authors

Sign in with Google

Consider DeepAI Pro