Bo Ren

research

∙ 05/12/2023

Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

Visual information extraction (VIE), which aims to simultaneously perfor...

0 Jianfeng Kuang, et al. ∙

research

∙ 05/07/2023

Multi-Space Neural Radiance Fields

Existing Neural Radiance Fields (NeRF) methods suffer from the existence...

0 Ze-Xin Yin, et al. ∙

research

∙ 04/18/2023

Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections

Neural implicit methods have achieved high-quality 3D object surfaces un...

0 Jiaxiong Qiu, et al. ∙

research

∙ 03/26/2023

Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies

Movie highlights stand out of the screenplay for efficient browsing and ...

0 Bei Gan, et al. ∙

research

∙ 02/28/2023

Turning a CLIP Model into a Scene Text Detector

The recent large-scale Contrastive Language-Image Pretraining (CLIP) mod...

0 Wenwen Yu, et al. ∙

research

∙ 11/28/2022

SLAN: Self-Locator Aided Network for Cross-Modal Understanding

Learning fine-grained interplay between vision and language allows to a ...

1 Jiang-Tian Zhai, et al. ∙

research

∙ 11/14/2022

Grafting Pre-trained Models for Multimodal Headline Generation

Multimodal headline utilizes both video frames and transcripts to genera...

7 Lingfeng Qiao, et al. ∙

research

∙ 10/10/2022

Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning

Recent language generative models are mostly trained on large-scale data...

0 Zhuoxuan Jiang, et al. ∙

research

∙ 08/22/2022

TaCo: Textual Attribute Recognition via Contrastive Learning

As textual attributes like font are core design elements of document for...

0 Chang Nie, et al. ∙

research

∙ 08/18/2022

See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval

Text-based person retrieval aims to find the query person based on a tex...

6 Xiujun Shu, et al. ∙

research

∙ 07/11/2022

GMN: Generative Multi-modal Network for Practical Document Information Extraction

Document Information Extraction (DIE) has attracted increasing attention...

0 Haoyu Cao, et al. ∙

research

∙ 07/05/2022

Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer

Real-world recognition system often encounters a plenty of unseen labels...

0 Sunan He, et al. ∙

research

∙ 07/04/2022

OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification

Scene segmentation and classification (SSC) serve as a critical step tow...

0 Ye Liu, et al. ∙

research

∙ 06/07/2022

RAAT: Relation-Augmented Attention Transformer for Relation Modeling in Document-Level Event Extraction

In document-level event extraction (DEE) task, event arguments always sc...

0 Yuan Liang, et al. ∙

research

∙ 06/06/2022

Contrastive Graph Multimodal Model for Text Classification in Videos

The extraction of text information in videos serves as a critical step t...

0 Ye Liu, et al. ∙

research

∙ 05/22/2022

Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation

The task of Grammatical Error Correction (GEC) has received remarkable a...

0 Jiquan Li, et al. ∙

research

∙ 05/11/2022

Scene Consistency Representation Learning for Video Scene Segmentation

A long-term video, such as a movie or TV show, is composed of various sc...

9 Haoqian Wu, et al. ∙

research

∙ 05/05/2022

Relational Representation Learning in Visually-Rich Documents

Relational understanding is critical for a number of visually-rich docum...

0 Xin Li, et al. ∙

research

∙ 04/18/2022

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training

The self-supervised Masked Image Modeling (MIM) schema, following "mask-...

7 Hao Liu, et al. ∙

research

∙ 03/27/2022

Knowledge Mining with Scene Text for Fine-Grained Recognition

Recently, the semantics of scene text has been proven to be essential in...

10 Hao Wang, et al. ∙

research

∙ 03/25/2022

Interactive Style Transfer: All is Your Palette

Neural style transfer (NST) can create impressive artworks by transferri...

0 Zheng Lin, et al. ∙

research

∙ 01/08/2022

Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

Capturing the dependencies between joints is critical in skeleton-based ...

0 Helei Qiu, et al. ∙

research

∙ 11/27/2021

Head and Body: Unified Detector and Graph Network for Person Search in Media

Person search in media has seen increasing potential in Internet applica...

0 Xiujun Shu, et al. ∙

research

∙ 11/26/2021

Neural Collaborative Graph Machines for Table Structure Recognition

Recently, table structure recognition has achieved impressive progress w...

3 Hao Liu, et al. ∙

research

∙ 11/25/2021

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition

Recently, Vision Transformers (ViT), with the self-attention (SA) as the...

0 Hao Liu, et al. ∙

research

∙ 11/21/2021

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

Existing anchor-base oriented object detection methods have achieved ama...

8 Zhonghua Li, et al. ∙

research

∙ 02/26/2020

PuzzleNet: Scene Text Detection by Segment Context Graph Learning

Recently, a series of decomposition-based scene text detection methods h...

14 Hao Liu, et al. ∙

research

∙ 08/21/2019

Scoot: A Perceptual Metric for Facial Sketches

While it is trivial for humans to quickly assess the perceptual similari...

10 Deng-Ping Fan, et al. ∙

research

∙ 05/26/2018

Enhanced-alignment Measure for Binary Foreground Map Evaluation

The existing binary foreground map (FM) measures to address various type...

0 Deng-Ping Fan, et al. ∙

research

∙ 04/09/2018

Face Sketch Synthesis Style Similarity:A New Structure Co-occurrence Texture Measure

Existing face sketch synthesis (FSS) similarity measures are sensitive t...

0 Deng-Ping Fan, et al. ∙

Bo Ren

Featured Co-authors

Sign in with Google

Consider DeepAI Pro