Bootstrapping Large-Scale Fine-Grained Contextual Advertising Classifier from Wikipedia

02/12/2021
by   Yiping Jin, et al.
0

Contextual advertising provides advertisers with the opportunity to target the context which is most relevant to their ads. However, its power cannot be fully utilized unless we can target the page content using fine-grained categories, e.g., "coupe" vs. "hatchback" instead of "automotive" vs. "sport". The widely used advertising content taxonomy (IAB taxonomy) consists of 23 coarse-grained categories and 355 fine-grained categories. With the large number of categories, it becomes very challenging either to collect training documents to build a supervised classification model, or to compose expert-written rules in a rule-based classification system. Besides, in fine-grained classification, different categories often overlap or co-occur, making it harder to classify accurately. In this work, we propose wiki2cat, a method to tackle the problem of large-scaled fine-grained text classification by tapping on Wikipedia category graph. The categories in IAB taxonomy are first mapped to category nodes in the graph. Then the label is propagated across the graph to obtain a list of labeled Wikipedia documents to induce text classifiers. The method is ideal for large-scale classification problems since it does not require any manually-labeled document or hand-curated rules or keywords. The proposed method is benchmarked with various learning-based and keyword-based baselines and yields competitive performance on both publicly available datasets and a new dataset containing more than 300 fine-grained categories.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2020

A Systematic Evaluation: Fine-Grained CNN vs. Traditional CNN Classifiers

To make the best use of the underlying minute and subtle differences, fi...
research
07/20/2023

RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection

The widespread use of face retouching filters on short-video platforms h...
research
09/07/2017

Scalable Annotation of Fine-Grained Categories Without Experts

We present a crowdsourcing workflow to collect image annotations for vis...
research
08/19/2021

Fine-Grained Element Identification in Complaint Text of Internet Fraud

Existing system dealing with online complaint provides a final decision ...
research
09/10/2021

WikiCSSH: Extracting and Evaluating Computer Science Subject Headings from Wikipedia

Hierarchical domain-specific classification schemas (or subject heading ...
research
09/21/2023

SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features

Wikipedia articles are hierarchically organized through categories and l...
research
01/21/2020

Classifying Wikipedia in a fine-grained hierarchy: what graphs can contribute

Wikipedia is a huge opportunity for machine learning, being the largest ...

Please sign up or login with your details

Forgot password? Click here to reset