Cha Zhang

research

∙ 09/20/2023

Kosmos-2.5: A Multimodal Literate Model

We present Kosmos-2.5, a multimodal literate model for machine reading o...

0 Tengchao Lv, et al. ∙

research

∙ 05/23/2023

From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

Current state-of-the-art models for natural language understanding requi...

0 Li Sun, et al. ∙

research

∙ 03/19/2023

Diffusion-based Document Layout Generation

We develop a diffusion-based approach for various document layout sequen...

0 Liu He, et al. ∙

research

∙ 10/06/2022

XDoc: Unified Pre-training for Cross-Format Document Understanding

The surge of pre-training has witnessed the rapid development of documen...

0 Jingye Chen, et al. ∙

research

∙ 03/04/2022

DiT: Self-supervised Pre-training for Document Image Transformer

Image Transformer has recently achieved significant progress for natural...

6 Junlong Li, et al. ∙

research

∙ 11/10/2021

Improving Structured Text Recognition with Regular Expression Biasing

We study the problem of recognizing structured text, i.e. text that foll...

0 Baoguang Shi, et al. ∙

research

∙ 09/21/2021

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Text recognition is a long-standing research problem for document digita...

0 Minghao Li, et al. ∙

research

∙ 04/18/2021

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Multimodal pre-training with text, layout, and image has achieved SOTA p...

0 Yiheng Xu, et al. ∙

research

∙ 12/29/2020

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

Pre-training of text and layout has proved effective in a variety of vis...

0 Yang Xu, et al. ∙

research

∙ 12/08/2020

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

In this paper, we propose Text-Aware Pre-training (TAP) for Text-VQA and...

0 Zhengyuan Yang, et al. ∙

research

∙ 02/10/2020

Multimodal active speaker detection and virtual cinematography for video conferencing

Active speaker detection (ASD) and virtual cinematography (VC) can signi...

0 Ross Cutler, et al. ∙

research

∙ 02/07/2020

Improving the Adversarial Robustness of Transfer Learning via Noisy Feature Distillation

Fine-tuning through knowledge transfer from a pre-trained model on a lar...

0 Ting-Wu Chin, et al. ∙

research

∙ 04/28/2019

LeGR: Filter Pruning via Learned Global Ranking

Filter pruning has shown to be effective for learning resource-constrain...

0 Ting-Wu Chin, et al. ∙

research

∙ 11/18/2018

RePr: Improved Training of Convolutional Filters

A well-trained Convolutional Neural Network can easily be pruned without...

0 Aaditya Prakash, et al. ∙

research

∙ 10/01/2018

Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks

Resource-efficient convolution neural networks enable not only the intel...

0 Ting-Wu Chin, et al. ∙

research

∙ 07/19/2017

Orthogonal and Idempotent Transformations for Learning Deep Neural Networks

Identity transformations, used as skip-connections in residual networks,...

0 Jingdong Wang, et al. ∙

research

∙ 08/03/2016

Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution

Crowd sourcing has become a widely adopted scheme to collect ground trut...

0 Emad Barsoum, et al. ∙

Cha Zhang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro