The BOSARIS Toolkit: Theory, Algorithms and Code for Surviving the New DCF

04/10/2013
by   Niko Brümmer, et al.
0

The change of two orders of magnitude in the 'new DCF' of NIST's SRE'10, relative to the 'old DCF' evaluation criterion, posed a difficult challenge for participants and evaluator alike. Initially, participants were at a loss as to how to calibrate their systems, while the evaluator underestimated the required number of evaluation trials. After the fact, it is now obvious that both calibration and evaluation require very large sets of trials. This poses the challenges of (i) how to decide what number of trials is enough, and (ii) how to process such large data sets with reasonable memory and CPU requirements. After SRE'10, at the BOSARIS Workshop, we built solutions to these problems into the freely available BOSARIS Toolkit. This paper explains the principles and algorithms behind this toolkit. The main contributions of the toolkit are: 1. The Normalized Bayes Error-Rate Plot, which analyses likelihood- ratio calibration over a wide range of DCF operating points. These plots also help in judging the adequacy of the sizes of calibration and evaluation databases. 2. Efficient algorithms to compute DCF and minDCF for large score files, over the range of operating points required by these plots. 3. A new score file format, which facilitates working with very large trial lists. 4. A faster logistic regression optimizer for fusion and calibration. 5. A principled way to define EER (equal error rate), which is of practical interest when the absolute error count is small.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2021

Out of a hundred trials, how many errors does your speaker verifier make?

Out of a hundred trials, how many errors does your speaker verifier make...
research
04/18/2021

Tutorial on logistic-regression calibration and fusion: Converting a score to a likelihood ratio

Logistic-regression calibration and fusion are potential steps in the ca...
research
03/06/2022

C-P Map: A Novel Evaluation Toolkit for Speaker Verification

Evaluation trials are used to probe performance of automatic speaker ver...
research
02/24/2023

Simulating and reporting frequentist operating characteristics of clinical trials that borrow external information

Borrowing of information from historical or external data to inform infe...
research
03/17/2021

Decision rules for identifying combination therapies in open-entry, randomized controlled platform trials

The design and conduct of platform trials have become increasingly popul...
research
01/23/2020

TADPOLE Challenge: Accurate Alzheimer's disease prediction through crowdsourced forecasting of future data

The TADPOLE Challenge compares the performance of algorithms at predicti...

Please sign up or login with your details

Forgot password? Click here to reset