Technical Report on Web-based Visual Corpus Construction for Visual Document Understanding

11/07/2022
by   Donghyun Kim, et al.
0

We present a dataset generator engine named Web-based Visual Corpus Builder (Webvicob). Webvicob can readily construct a large-scale visual corpus (i.e., images with text annotations) from a raw Wikipedia HTML dump. In this report, we validate that Webvicob-generated data can cover a wide range of context and knowledge and helps practitioners to build a powerful Visual Document Understanding (VDU) backbone. The proposed engine is publicly available at https://github.com/clovaai/webvicob.

READ FULL TEXT

page 2

page 9

page 10

page 11

research
07/18/2023

OxfordVGG Submission to the EGO4D AV Transcription Challenge

This report presents the technical details of our submission on the EGO4...
research
11/29/2022

ClueWeb22: 10 Billion Web Documents with Visual and Semantic Information

ClueWeb22, the newest iteration of the ClueWeb line of datasets, provide...
research
07/04/2023

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Document understanding refers to automatically extract, analyze and comp...
research
06/01/2023

A big data approach towards sarcasm detection in Russian

We present a set of deterministic algorithms for Russian inflection and ...
research
01/25/2021

MadDog: A Web-based System for Acronym Identification and Disambiguation

Acronyms and abbreviations are the short-form of longer phrases and they...
research
10/03/2022

Russian Web Tables: A Public Corpus of Web Tables for Russian Language Based on Wikipedia

Corpora that contain tabular data such as WebTables are a vital resource...
research
07/01/2021

Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations

Electronic Theses and Dissertations (ETDs) contain domain knowledge that...

Please sign up or login with your details

Forgot password? Click here to reset