Objects of violence: synthetic data for practical ML in human rights investigations

04/01/2020
by   Lachlan Kermode, et al.
0

We introduce a machine learning workflow to search for, identify, and meaningfully triage videos and images of munitions, weapons, and military equipment, even when limited training data exists for the object of interest. This workflow is designed to expedite the work of OSINT ("open source intelligence") researchers in human rights investigations. It consists of three components: automatic rendering and annotating of synthetic datasets that make up for a lack of training data; training image classifiers from combined sets of photographic and synthetic data; and mtriage, an open source software that orchestrates these classifiers' deployment to triage public domain media, and visualise predictions in a web interface. We show that synthetic data helps to train classifiers more effectively, and that certain approaches yield better results for different architectures. We then demonstrate our workflow in two real-world human rights investigations: the use of the Triple-Chaser tear gas grenade against civilians, and the verification of allegations of military presence in Ukraine in 2014.

READ FULL TEXT

page 2

page 7

page 8

page 9

research
04/30/2023

Synthetic Data-based Detection of Zebras in Drone Imagery

Datasets that allow the training of common objects or human detectors ar...
research
12/05/2019

Generative Synthesis of Insurance Datasets

One of the impediments in advancing actuarial research and developing op...
research
05/03/2021

Synthetic Data for Model Selection

Recent improvements in synthetic data generation make it possible to pro...
research
09/30/2021

Fake It Till You Make It: Face analysis in the wild using synthetic data alone

We demonstrate that it is possible to perform face-related computer visi...
research
10/24/2022

FairGen: Fair Synthetic Data Generation

With the rising adoption of Machine Learning across the domains like ban...
research
11/02/2022

Web-based Elicitation of Human Perception on mixup Data

Synthetic data is proliferating on the web and powering many advances in...
research
02/28/2022

Attribute Descent: Simulating Object-Centric Datasets on the Content Level and Beyond

This article aims to use graphic engines to simulate a large number of t...

Please sign up or login with your details

Forgot password? Click here to reset