On using Product-Specific Schema.org from Web Data Commons: An Empirical Set of Best Practices

07/27/2020
by   Ravi Kiran Selvam, et al.
0

Schema.org has experienced high growth in recent years. Structured descriptions of products embedded in HTML pages are now not uncommon, especially on e-commerce websites. The Web Data Commons (WDC) project has extracted schema.org data at scale from webpages in the Common Crawl and made it available as an RDF `knowledge graph' at scale. The portion of this data that specifically describes products offers a golden opportunity for researchers and small companies to leverage it for analytics and downstream applications. Yet, because of the broad and expansive scope of this data, it is not evident whether the data is usable in its raw form. In this paper, we do a detailed empirical study on the product-specific schema.org data made available by WDC. Rather than simple analysis, the goal of our study is to devise an empirically grounded set of best practices for using and consuming WDC product-specific schema.org data. Our studies reveal six best practices, each of which is justified by experimental data and analysis.

READ FULL TEXT

page 4

page 5

page 6

page 7

research
02/28/2020

An Empirical Study on the Design and Evolution of NoSQL Database Schemas

We study how software engineers design and evolve their domain model whe...
research
05/15/2018

Building an Ecosystem for the Tyrolean Tourism Knowledge Graph

The introduction of the schema.org vocabulary was a big step towards mak...
research
02/16/2018

Analysis of Schema.org Usage in the Tourism Domain

Schema.org is an initiative founded in 2011 by the four-big search engin...
research
06/12/2020

Google Dataset Search by the Numbers

Scientists, governments, and companies increasingly publish datasets on ...
research
10/11/2020

Exploiting Knowledge Graphs for Facilitating Product/Service Discovery

Most of the existing techniques to product discovery rely on syntactic a...
research
03/01/2018

Inferring Missing Categorical Information in Noisy and Sparse Web Markup

Embedded markup of Web pages has seen widespread adoption throughout the...
research
10/22/2020

What is Web Scraping: Introduction, Applications and Best Practices

Web scraping typically extracts large amounts of #data from #websites fo...

Please sign up or login with your details

Forgot password? Click here to reset