New/s/leak 2.0 - Multilingual Information Extraction and Visualization for Investigative Journalism

07/13/2018
by   Gregor Wiedemann, et al.
0

Investigative journalism in recent years is confronted with two major challenges: 1) vast amounts of unstructured data originating from large text collections such as leaks or answers to Freedom of Information requests, and 2) multi-lingual data due to intensified global cooperation and communication in politics, business and civil society. Faced with these challenges, journalists are increasingly cooperating in international networks. To support such collaborations, we present the new version of new/s/leak 2.0, our open-source software for content-based searching of leaks. It includes three novel main features: 1) automatic language detection and language-dependent information extraction for 40 languages, 2) entity and keyword visualization for efficient exploration, and 3) decentral deployment for analysis of confidential data from various formats. We illustrate the new analysis capabilities with an exemplary case study.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2018

A Multilingual Information Extraction Pipeline for Investigative Journalism

We introduce an advanced information extraction pipeline to automaticall...
research
09/14/2023

Automatic Data Visualization Generation from Chinese Natural Language Questions

Data visualization has emerged as an effective tool for getting insights...
research
06/29/2021

Language Lexicons for Hindi-English Multilingual Text Processing

Language Identification in textual documents is the process of automatic...
research
02/19/2022

MACRONYM: A Large-Scale Dataset for Multilingual and Multi-Domain Acronym Extraction

Acronym extraction is the task of identifying acronyms and their expande...
research
11/05/2021

A Semi-automatic Data Extraction System for Heterogeneous Data Sources: A Case Study from Cotton Industry

With the recent developments in digitisation, there are increasing numbe...
research
04/01/2023

Network Visualization of ChatGPT Research: a study based on term and keyword co-occurrence network analysis

The main objective of this paper is to identify the major research areas...

Please sign up or login with your details

Forgot password? Click here to reset