Inferring Missing Categorical Information in Noisy and Sparse Web Markup

03/01/2018
by   Nicolas Tempelmeier, et al.
0

Embedded markup of Web pages has seen widespread adoption throughout the past years driven by standards such as RDFa and Microdata and initiatives such as schema.org, where recent studies show an adoption by 39 already in 2016. While this constitutes an important information source for tasks such as Web search, Web page classification or knowledge graph augmentation, individual markup nodes are usually sparsely described and often lack essential information. For instance, from 26 million nodes describing events within the Common Crawl in 2016, 59 statements and only 257,000 nodes (0.96 subtypes. Nevertheless, given the scale and diversity of Web markup data, nodes that provide missing information can be obtained from the Web in large quantities, in particular for categorical properties. Such data constitutes potential training data for inferring missing information to significantly augment sparsely described nodes. In this work, we introduce a supervised approach for inferring missing categorical properties in Web markup. Our experiments, conducted on properties of events and movies, show a performance of 79 baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2021

WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Web search is an essential way for human to obtain information, but it's...
research
09/25/2021

Predicting Hidden Links and Missing Nodes in Scale-Free Networks with Artificial Neural Networks

There are many networks in real life which exist as form of Scale-free n...
research
10/10/2022

Association Graph Learning for Multi-Task Classification with Category Shifts

In this paper, we focus on multi-task classification, where related clas...
research
07/27/2020

On using Product-Specific Schema.org from Web Data Commons: An Empirical Set of Best Practices

Schema.org has experienced high growth in recent years. Structured descr...
research
06/12/2020

Google Dataset Search by the Numbers

Scientists, governments, and companies increasingly publish datasets on ...

Please sign up or login with your details

Forgot password? Click here to reset