Network Flow Methods for the Minimum Covariates Imbalance Problem
The problem of balancing covariates arises in observational studies where one is given a group of control samples and another group, disjoint from the control group, of treatment samples. Each sample, in either group, has several observed nominal covariates. The values, or categories, of each covariate partition the treatment and control samples to a number of subsets referred to as levels where the samples at every level share the same covariate value. We address here a problem of selecting a subset of the control group so as to balance, to the best extent possible, the sizes of the levels between the treatment group and the selected subset of control group, the min-imbalance problem. It is proved here that the min-imbalance problem, on two covariates, is solved efficiently with network flow techniques. We present an integer programming formulation of the problem where the constraint matrix is totally unimodular, implying that the linear programming relaxation to the problem has all basic solutions, and in particular the optimal solution, integral. This integer programming formulation is linked to a minimum cost network flow problem which is solvable in O(n· (n' + nlog n)) steps, for n the size of the treatment group and n' the size of the control group. A more efficient algorithm is further devised based on an alternative, maximum flow, formulation of the two-covariate min-imbalance problem, that runs in O(n'^3/2log^2n) steps.
READ FULL TEXT