DeepAI AI Chat
Log In Sign Up

Information Extraction in Illicit Domains

03/09/2017
by   Mayank Kejriwal, et al.
USC Information Sciences Institute
0

Extracting useful entities and attribute values from illicit domains such as human trafficking is a challenging problem with the potential for widespread social impact. Such domains employ atypical language models, have `long tails' and suffer from the problem of concept drift. In this paper, we propose a lightweight, feature-agnostic Information Extraction (IE) paradigm specifically designed for such domains. Our approach uses raw, unlabeled text from an initial corpus, and a few (12-120) seed annotations per domain-specific attribute, to learn robust IE models for unobserved pages and websites. Empirically, we demonstrate that our approach can outperform feature-centric Conditional Random Field baselines by over 18% F-Measure on five annotated sets of real-world human trafficking datasets in both low-supervision and high-supervision settings. We also show that our approach is demonstrably robust to concept drift, and can be efficiently bootstrapped even in a serial computing environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/15/2016

Attribute Extraction from Product Titles in eCommerce

This paper presents a named entity extraction system for detecting attri...
10/02/2019

Concept Drift Detection and Adaptation with Weak Supervision on Streaming Unlabeled Data

Concept drift in learning and classification occurs when the statistical...
02/25/2019

Bootstrapping Domain-Specific Content Discovery on the Web

The ability to continuously discover domain-specific content from the We...
12/21/2022

ImPaKT: A Dataset for Open-Schema Knowledge Base Construction

Large language models have ushered in a golden age of semantic parsing. ...
06/01/2018

OpenTag: Open Attribute Value Extraction from Product Profiles

Extraction of missing attribute values is to find values describing an a...
12/10/2020

Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition

In many scenarios, named entity recognition (NER) models severely suffer...