Roos' Matrix Permanent Approximation Bounds for Data Association Probabilities

07/17/2018 ∙ by Lingji Chen, et al. ∙ 0

Matrix permanent plays a key role in data association probability calculations. Exact algorithms (such as Ryser's) scale exponentially with matrix size. Fully polynomial time randomized approximation schemes exist but are quite complex. This letter introduces to the tracking community a simple approximation algorithm with error bounds, recently developed by Bero Roos, and illustrates its potential use for estimating probabilities of data association hypotheses.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In multi-target tracking, the (normalized) likelihoods of the associations between tracks and measurements are calculated using motion and sensor models, and for some tracking algorithms, these suffice to define a maximum likelihood solution. However, there are situations in which the probabilities of association hypotheses are also important, or even required for the algorithms. One example is Joint Probabilistic Data Association Filter (JPDAF) as described by [1], where the evaluation of target-measurement association probabilities is a necessary part of the algorithm. Another example is Generalized Labeled Multi-Bernoulli (GLMB) Filter as described in [2]; here the probabilities are not required, but it would be good to know, quantitatively, how much truncation error [3] has occurred: When we keep for example only the top 100 hypotheses, are we keeping 90% of the probability mass, or just 50%?

To get such probabilities we need to normalize by the sum of likelihoods of all permissible association hypotheses, whose number grows combinatorially. If we construct a “likelihood matrix” whose entries are derived from pairwise target-measurement likelihoods, and append it with diagonal matrices for missed detections and target deaths, as is done in [2], then under an independence assumption, each hypothesis likelihood is a product of “non-conflicting” terms from this matrix, and the normalizing factor we seek is the permanent of the matrix [4].

Exact matrix permanent algorithms, such as Ryser’s [5, 1], scale exponentially with the matrix size [6, 7]. For a matrix with nonnegative entries, a fully polynomial time randomized approximation scheme (FPRAS) is presented in [8]

through Markov Chain Monte Carlo (MCMC), which can calculate a solution within a factor of

of being optimal for a given . This algorithm is quite complex to analyze and implement. On the other hand, as is shown in [9], even “crude” approximations may turn out to be useful for estimating various probabilities. With such a motivation, this letter brings to the attention of the tracking community a recent result by Bero Roos [10] that provides a first order and a second order approximations to the permanent of a rectangular matrix, both with error bounds111For higher order approximations, see for example [11]..

Section II presents Roos’ algorithm with simplified notations. Section III illustrates a potential use for estimating association hypotheses probabilities, and also points out the issue with computation time. Section V discusses future research.

Ii The Roos’ approximations

We use, for concreteness, the matrix layout in Figure 1 of [2] for (normalized) likelihoods222without taking the logarithm: Each row corresponds to either an existing target, or a potential new-born target from a Labeled Multi-Bernoulli birth model. Each column corresponds to one of the following situations: (1) a measurement for a survived and detected target, (2) a survived but undetected target, and (3) a dead or unborn target. An association hypothesis essentially “picks” likelihood entries from the matrix, such that there is exactly one entry picked for each row, and zero or one entry picked for each column. Measurements that are not picked automatically become clutter and need not be explicitly dealt with333which may explain why the contemporary GLMB filters are more efficient than the classical Hypothesis-Oriented Multiple Hypothesis Tracker (HO-MHT) [12]. .

Thus the matrix always has more columns than rows. However, in order to follow the presentation in [10] closely, we will describe the algorithm using a “thin” matrix with more rows than columns; this means that we will apply Roos’ algorithm to the transpose of our likelihood matrix for computation.

Let denote the set of all -permutations of , the ordered arrangements of a -element subset of an -set, and denote the set444To iterate such sets in a memory efficient way, see [13]. of all -combinations of , the unordered -element subset of an -set. Then the permanent of a thin matrix is defined as

(1)

For , set the column average to

Define for an index subset the product

Using a Matlab-type notation “:” to denote consecutive integers, we define

and

and state the first and second order approximations respectively as

and

where definitions used in the error bounds are given below. To save space, we skip special cases and only describe those where .

First,

For , define

Define a shorthand notation for row difference

Then

where for , the constants are given by

The functions are defined as

Iii An example of ideal usage

We will illustrate one use of Roos’ permanent approximations in the framework of GLMB [2]. For ease of exposition we will consider the case where at time there is only one hypothesis, and at time it gives rise to hypotheses, assuming that we enumerate them all. The weight of each hypothesis is proportional to the product of likelihoods inside the summation in Equation (1), noting that the matrix has been transposed. After normalizing the weight by the permanent, we will get the probability of each hypothesis.

However, for any practical application of GLMB, we cannot enumerate all child hypotheses, and have to truncate at a number, say . Then the weights are normalized by the sum of these weights, not the sum of all weights which is given by the permanent. The truncation error is given in [3], which confirms our intuition that we should pick the highest weights 555or the best weights we can find within a computation budget using for example Gibbs sampling to keep. If we take the negative log of the likelihood matrix , then the best assignments can be enumerated by the Murty’s algorithm [14, 15], which calls as a subroutine the Munkres algorithm that finds the best bipartite matching [16, 17, 18].

It would be quite useful, even if done offline, to know quantitatively what the truncation error is: Do these hypotheses represent 90% of the probability mass, or only 50%?

To illustrate the point, we create a toy example with a random likelihood matrix of size 4 by 12 and run Murty’s algorithm on it, recording the cummulative likelihoods with each increment of the value . This is shown as the blue curve666If we use Gibbs sampling instead of Murty’s algorithm, the curve will still be increasing but not necessarily concave. in Figure 1. We also calculate, by Ryser’s algorithm, the exact permanent and mark it as the black, dotted line. The approximate permanent calculated by Roos’ second approximation, together with its upper and lower bounds, are marked as the red, cyan and green lines.

Fig. 1: A toy example showing the use of Roos’ approximation bounds for estimating probabilities of the “best ” hypotheses.

It can be seen from the figure that if we stop at about the top 25 hypotheses, we are guaranteed to capture about of the total probability, and there is even hope that the percentage can be as high as . If we continue to obtain the top 100 hypotheses, the lower bound is no longer informative, but the upper bound guarantees that we have about of the probability. Empirically we have observed that Roos’ second approximation is often quite accurate but with conservative bounds, so the percentage may be close to the truth, which we know is .

Iv The issue of computation time

Ryser’s algorithm scales exponentially while Roos’ approximation scales polynomially, so for large matrices the latter should be faster to compute than the former. However, for “mid-sized” matrices, Roos’s first approximation is fast but conservative while Roos’ second approximation is more useful but slow, often slower than Ryser’s algorithm. This point is illustrated by an experiment shown in Figure 2, where computation times for random likelihood matrices are plotted, based on unoptimized Matlab code. The structure of the likelihood matrix corresponds to 10 targets (existing and birthing) and 10 to 15 measurements, i.e., with 10 rows and 30 to 35 columns.

Fig. 2: Computation times of unoptimized Matlab code for random likelihood matrices with 10 rows and 30 to 35 columns (10 to 15 for the measurement block).

The fact that both Ryser’s and Roos’ second approximation are unbearably slow for a matrix of this size indicate the following possible ways of improvement:

  • optimized implementation in a compiled languge;

  • parallelized implementation;

  • exploitation of the diagonal structure of the second and third blocks of the likelihood matrix (used in GLMB filtering);

  • better approximation algorithms.

V Conclusions

In this letter we have presented Roos’ approximation algorithms with bounds for matrix permanent. We illustrated their use in estimating data association probablities, such as in GLMB filtering where only the top hypotheses are kept. We pointed out the challenge in computation time and proposed directions for improvements.

Acknowledgment

The author would like to thank Professor Bero Roos for discussions and clarifications, and Professor Jeffrey Ulhmann for providing a cited paper. He would also like to thank Ms. Emily Polson for support.

References