1 Introduction
Imperfectinformation games model strategic interactions between agents that do not have perfect knowledge of their current situation, such as auctions, negotiations, and recreational games such as poker and battleship. Linear programming (LP) can be used to solve—that is, to find a Nash equilibrium in—imperfectinformation twoplayer zerosum perfectrecall games (Koller et al., 1994). However, due mostly to memory usage (see e.g., Zinkevich et al., 2007 or Brown & Sandholm, 2019), it has generally been thought of as impractical for solving large games. Thus, a series of other techniques has been developed for solving such games. Most prominent among these is the counterfactual regret minimization (CFR) family of algorithms (Zinkevich et al., 2007). These algorithms work by iteratively improving both player’s strategies until their time averages converge to an equilibrium. They have a worstcase bound of iterations needed to reach accuracy , and more recent improvements, most notably CFR+ (Tammelin, 2014) and discounted CFR (DCFR) (Brown & Sandholm, 2019) mean that algorithms from the CFR family remain the state of the art in practice for solving large games, and have been used as an important part of the computational pipelines to achieve superhuman performance in benchmark cases such as headsup limit (Bowling et al., 2015) and nolimit (Brown & Sandholm, 2017) Texas hold’em.
Several families of algorithms have theoretically faster convergence rates than those of the CFR family. Firstorder methods (Hoda et al., 2010; Kroer et al., 2015) have a theoretically better convergence guarantee of (or even with a condition number (Gilpin et al., 2012)), but in practice perform worse than the newest algorithms in the CFR family (Kroer et al., 2018; Brown & Sandholm, 2019). Standard algorithms for LP are known that converge at rate , but for the most part, these algorithms require storage of the whole payoff matrix explicitly, which the CFR family does not, and often require superlinear time per iteration (with respect to the number of nonzero entries of the LP constraint matrix), which is prohibitive when the game is extremely large.
In this paper we investigate how to reduce the space requirements of LP solvers by factoring the possibly dense payoff matrix of an extensiveform game. A long body of work investigates the problem of decomposing, or factoring, a given matrix as the product of other matrices, with some objective in mind. Studied objectives include the speedup of certain operations, as in the LU or Cholesky factorizations, and approximation of the matrix
in a certain norm, as in the singular value decomposition (SVD). Our objective in this work is
sparsity: we investigate the problem of factoring a matrix into the product of two matrices and that are much sparser than . This differs from the usual lowrank approximation in that the optimization objective is different (norm, that is, number of nonzero entries, instead of the norm, that is, the square root of sum of squares), and that the matrices and might not be low rank (and in fact in poker they will high rank that is linear in the number of sequences in the game).We are not aware of any prior applicationindependent work that addresses this problem. The SVD approximates in the wrong norm for this purpose: norm approximations will in the general case have fully dense residual, which is not desirable. The body of work on sparse PCA (e.g., Zou & Xue, 2018) focuses on lowrank sparse factorizations. That still mostly focuses on norm approximations, and the runtime of the algorithm usually scales with the rank of the factorization as well as the size of . In our cases, an optimal factorization may have high or even full rank (and yet be sparse), and our matrices are large enough that quadratic (or worse) dependence on , which is often seen in these algorithms, is unacceptable. Our goal is to find such a factorization efficiently. Neyshabur & Panigrahy (2013) address the same problem, but restrict their attention to matrices that are known a priori to be the product of sparse matrices with entries drawn independently from a nice distribution. This is not the case in our setting. Richard et al. (2014) attack a related but still substantially different problem, of finding a sparse factorization when we know a priori a good bound on the sparsity of each row or column of the factors. This, too, is not true in our setting: no such bound may even exist, much less be known.
Our main technical contribution is a novel practical matrix factorization algorithm that greatly reduces the size of the game payoff matrix in many cases. This matrix factorization allows LP algorithms to run in far less memory and time than previously known, bringing the memory requirement close to that of CFR. We demonstrate in practice that this method can reduce the size of a payoff matrix by multiple orders of magnitude, yielding improvements in both the time and space efficiency of solving the resulting LP. This makes our approach—automated matrix sparsification followed by LP—superior to domainindependent versions of the fastest CFR variant. If high accuracy is desired, our domainindependent approach in many cases outperforms even a highly customized pokerspecific implementation of the fastest CFR variant (Brown & Sandholm, 2019).
We show experiments with the primal simplex, dual simplex, and the barrier method as the LP solver. The barrier method runs in polynomial time but each iteration is heavy in terms of memory and time. For that reason, we present techniques that significantly speed up a recent LP algorithm (Yen et al., 2015) that has iteration time and memory linear in the number of nonzero entries in the constraint matrix, and show experiments with that as well. Our experiments show interesting performance differences among the LP solvers as well.
2 Preliminaries
Extensiveform games. We study the standard representation of games which can include sequential and simultaneous moves, as well as imperfect information, called an extensiveform game. It consists of the following. (1) A set of players , usually identified with positive integers. Random chance, or “nature” is also considered a player, and will be referred to as player 0. (2) A finite tree of histories or nodes, rooted at some initial state . Each node is labeled with the player (possibly nature) who acts at that node. The set of leaves, or terminal states, in will be denoted . The edges connecting any node to its children are labeled with actions. (3) For each player , a utility function . (4) For each player , a partition of the nodes at which player acts into a collection of information sets. In each information set , every pair of nodes must have the same set of actions. (5) For each node at which nature acts, a distribution over the actions available that node.
For any history and any player , the sequence of player at node is the sequence of information sets reached and actions played by player on the path from the root node to . The set of sequences for player is denoted . A player has perfect recall if whenever and are in the same information set . In this work, we will focus our attention on twoplayer zerosum games of perfect recall; i.e., games in which , , and both players have perfect recall. For simplicity of notation, the opponent of player will be denoted .
A behavior strategy (hereafter simply strategy) for player is, for each information set at which player acts, a distribution over the actions available at that infoset. When an agent reaches information set , it chooses an action according to . A pair of behavior strategies, one for each player, is a strategy profile. The reach probability
is the probability that node
will be reached, assuming that player plays according to strategy , and all other players (including nature) always choose actions leading to when possible. This definition extends to sets of nodes or to sequences by summing the reach probabilities.The best response value for player against an opponent strategy is the largest achievable value; i.e., in a twoplayer game, . A strategy is an best response to opponent strategy if . A strategy profile is an Nash equilibrium if its Nash gap is at most . Best responses and Nash equilibria are respectively best responses and Nash equilibria. The exploitability of a strategy is how far away is away from a Nash equilibrium: where is a Nash equilibrium strategy for the player. In a zerosum game, the Nash value is the same for every Nash equilibrium strategy, so the exploitability is welldefined.
Equilibrium finding via linear programming. Nash equilibrium finding in an extensiveform game can be cast as an LP in the following fashion (von Stengel, 1996). Consider mapping behavior strategies
to vectors
by setting for every sequence . We will refer to vector as a strategy. Under this framework, equilibrium finding can be cast as a bilinear saddle point problem(1) 
where the matrices and satisfy , , and encode the constraints on the behavior strategies and . is the payoff matrix whose entry is the expected payoff for Player 1 when Player 1 plays to reach sequence and Player 2 plays to reach sequence : where is the th unit vector. The number of entries . Now taking the dual of the inner minimization yields the LP
(2) 
Expressed in any LP standard form, the constraint matrix has nonzero entries in its constraint matrix. The LP can be solved with any standard solver.
Sparse linear programming. Yen et al. (2015) give a generic algorithm for solving LPs in the standard form^{1}^{1}1 We use the subscript LP everywhere due to the clash of variable naming conventions between LP (where is the constraint matrix) and equilibrium finding (where is the payoff matrix).
(3) 
We first give a brief overview of the algorithm. We are interested in LPs of the standard form (3) and their duals
(4) 
where . Consider the convex subproblem
(5) 
for some given initial solution and real number . The dual of this subproblem is
(6)  
(7) 
The approach is shown in Algorithm 1. In Line 2, the solution to Problem (7) is computed via either a randomized coordinate descent (RC) or a projected NewtonCG (PG) algorithm; the details are not important here. The breakthrough of Yen et al. (2015) is an implementation of these inner loops in time, rather than or worse. At each iteration, is infeasible in the original problem since the quadratic regularization term in (7) does not punish slightly infeasible solutions much at all. is infeasible since is a suboptimal solution to (7). Thus, Algorithm 1 works with infeasible solutions to the LP, which must be projected back into the feasible space.
Theorem 1 (Theorem 3 in Yen et al., 2015).
After outer iterations of Algorithm 1, each of which is run for inner iterations, we have where is the set of dualoptimal solutions and is Euclidean distance.
The in the above theorem hides problemdependent constants such as condition numbers. This theoretical guarantee applies to the dual solution, and not the primal. Thus, to find a primaldual solution pair, in theory we must run Algorithm 1 twice: on the primal (to find a dual solution) and then the dual (to find a primal solution). In practice, the primal solution from the first run already has extremely low exploitability, so the second run would be unnecessary.
The rest of the paper covers our new contributions.
3 Adapting the Sparse LP Solver
In order to make the above LP algorithm fast for game solving, we had to make a modification and also deal with the caveat of eternally infeasible solutions and .
3.1 Limiting the Number of Inner Iterations
Yen et al. (2015) give an implementation of their algorithm, which they call LPsparse. In it, the inner loop runs until either (1) it converges to a sufficiently small error tolerance, or (2) some prescribed iteration limit is hit. The iteration limit is set to increase exponentially every time it is hit. In practice, we found this to be far too aggressive, leading to inner loops that take prohibitively long (an hour or more on twoplayer nolimit Texas hold’em endgames). Thus, we instead we only allow the number of inner iterations to grow linearly with respect to the number of outer iterations. Since both the outer and inner loop lengths are bounded by in Theorem 1, this does not change the theoretical performance guarantee of the algorithm, and it leads to a significant speedup in practice.
3.2 Normalizing Infeasible Solutions
Algorithm 1 will output an infeasible solution pair. To retrieve a valid behavior strategy (feasible solution), we first project into the positive orthant (i.e., zero out any negative entries), and then normalize each information set in topological order, starting with the root. This results in a strategy pair whose Nash gap we can evaluate. This normalization step roughly maintains the guarantee of Theorem 1:
Theorem 2.
Suppose is an infeasible solution to (2) such that , where is the set of optimal solutions to . Then the above normalization yields a (feasible) strategy with exploitability at most , where is the total number of sequences between the two players.
A proof is in the appendix. The above bound is loose, but it is unnecessary to improve it for the theoretical punchline: combining Theorems 1 and 2, we see that the LP algorithm converges to a strategy with exploitability in inner iterations (where the possibly hides problemdependent constants), assuming is normalized (i.e., is fixed to, say, 1).
4 Sparse Factorization
In many games, the payoff matrix is somewhat dense. This occurs when the number of terminal game tree nodes, , is large compared to the total number of sequences , that is, when a significant fraction of the sequence pairs represent valid terminal nodes. In most normalform (a.k.a. matrixform) games, is fully dense, whereas in extensiveform games of perfect information, is extremely sparse (because the number of terminal nodes equals the total number of terminal sequences between the players). In most real games, the value of each entry can be computed in constant time from the indices and alone based on the rules of the game, with a minimal amount of auxiliary memory, so can be stored implicitly. In these cases, the sparse LP solver is at a disadvantage compared to the CFR family of algorithms. CFR can run with only implicit access to . Its memory usage is thus . LP solvers, on the other hand, require a full description of , which here will have size . Our idea here is to make LP solvers practical by carefully compressing in a way that standard LP solvers can still handle.
This leads to our main idea. If we can write approximately as the product of two matrices; that is, , such that , then we can reformulate the LP (2) as
(8)  
(9) 
which, in standard form, has nonzero constraint matrix entries. In this formulation, we demand that not only and but also the residual be sparse. Depending on the density of and the quality of the factorization , a good factorization could yield a quadratic improvement in both the time and space used by the LP solver.
When is lowrank, SVD would provide such a factorization. However, in many cases, is not sparse and not well approximated by a lowrank factorization. Further, even when is lowrank, it is possible that, for example, is a dense matrix (where is the best rank1 approximation to ), which means that the algorithm would take time and memory per iteration starting from the second outer iteration. We now give examples of matrices for which finding a sparse factorization in our style is superior to finding a standard lowrank factorization (SVD), both in speed and resulting sparsity. An additional example can be found in the appendix.
Example 1.
Let be a rankone matrix, and let be , except its lowertriangular half has been zeroed out. In general, will now be fullrank, and the SVD of will not be sparse. However, we can express with as follows. Set except with its right half zeroed out, and set except with its left half zeroed out. Then matches the upperright quadrant of the matrix , as shown in Figure 1. Moreover, is block diagonal, where the two blocks have size and have the same structure as itself. Thus, we may recursively factor the two blocks. The vectors and both have nonzero entries, so the total number of nonzero entries in the factorization is expressed by the recurrence , which solves to . The matrices and will both have columns. This example shows up in practice; the payoff matrix of poker endgames is block diagonal, where the blocks have essentially this form.
Example 2.
Let be a sparselyfactorable matrix, and where the residual may be highrank, but is sparse. For example, perhaps is with some entries around its diagonal zeroed out. Then itself is also sparsely factorable as . This example may seem trivial, but the SVD does not share a similar property. For example, if is the matrix from Example 1, and
is a general sparselyfactorable matrix (even the zero matrix), then the SVD of
will be dense, but will still have a sparse factorization.5 Factorization Algorithm
In this section, we develop a general algorithm for factoring an arbitrary sparse matrix into the product of two possibly sparser—and never denser—matrices. For this section, we let and so that . We follow the general strategy used by the power iteration SVD algorithm (e.g., Golub & Van Loan, 1996). Algorithm 2 reduces the factorization problem to solving, for a given matrix , the subproblem
(10) 
When is the 2norm, this problem can be solved using the standard power iteration algorithm. However, when is the 0norm, the problem is not so easy, and even using the 1norm as a convex substitute for the 0norm does not help:
Theorem 3 (Gillis & Vavasis, 2018).
When is the 1norm or 0norm, the optimization problem (10) is NPhard.
We thus resort to an algorithm that may not yield the optimal solution but works extremely well in practice. Algorithm 3 reduces (10) to solving the subproblem
(11) 
for a given matrix and now a fixed vector (the other subproblem is analogous by transposing and flipping the roles of and ). Again, when is the 2norm, and the optimizer of (11) is just , so that the full algorithm is just standard power iteration algorithm for SVD.
When is instead the 0norm, as seen in Algorithm 4, the optimizer of (11) is the vector whose th element is the mode of over all for which . Since the objective function (10) cannot increase during the alternating minimization, and the objective values are integral, Algorithm 3 terminates in finitely many iterations at a local optimum. Algorithm 2 is an anytime algorithm. In practice, we terminate it when the number of unsuccessful iterations (the number of iterations in which the condition on line 4 is false) exceeds the number of successful iterations.
Algorithms 24 constitute an approximate algorithm for sparse matrix factorization. It is not exact for two reasons. First, Algorithm 2 is greedy: at each step of the loop, it chooses the immediate best rank1 matrix and greedily appends it to and . This is not always optimal, and in fact in the worst case can already doom the algorithm to have a trivial approximation factor . Second, Algorithm 3 is not exact when is the 0norm or 1norm, as expected from our Theorem 3. Nevertheless, the method works remarkably well as we will show experimentally.^{2}^{2}2We also experimented with using the 1norm as a convex relaxation of the 0norm. Here, the exact solution to (11) is given by Meng & Xu (2012). This seemed to make no difference in practice, so in the experiments we use the 0norm.
5.1 Initialization
For the SVD, the initial guess for in Algorithm 3 is usually chosen to point in a random direction (i.e., are drawn i.i.d). In our case, this does not work: if we draw that way, then as long as each column of has at least two nonzero entries, the mode computation in Algorithm 4 will return with probability , since will be different for each with probability . This causes the subroutine of Algorithm 2 to degenerate, leading to a trivial output. This is troubling for us, since in basically all extensiveform games, is much sparser than this. Fortunately, one small change yields an initialization that works well. Instead of initializing to a random unit vector, we initialize to a random basis vector . This circumvents the above problem, and leads to remarkably strong performance in practice.
5.2 Implementation with Implicit Matrices
A major problem with the above algorithm is that a straightforward implementation of it would store and modify the matrix in order to factor it. In the setting we are considering, is often too big to store in memory: the number of nonzero entries in may be several orders of magnitude greater than the number of rows or columns. In these settings, we would like to be able to implement the algorithm with only implicit access to . Formally, we assume access to via only an immutable oracle that, given an index , retrieves the list of nonzero entries, and their indices, in the th row or th column of .
The immutability of is the biggest roadblock here. Several changes need to be made to Algorithms 24 to accomodate this. First, Line 7 of Algorithm 2 is no longer possible.. Thus, Line 3 of Algorithm 2, must be revised to read , and the matrices and must be passed through to Algorithms 3 and 4. On Line 3 of Algorithm 4, querying the th row of requires a matrix multiplication , where is the th row of .
5.3 Runtime Analysis
The run time of the algorithm depends heavily on the structure of . The worst case run time is , since every inner iteration runs in at most quadratic time and removes at least one nonzero entry from . In practice it runs dramatically faster than that, and we will now present a rough analysis, valid in most typical cases. For simplicity, assume is square. This doesn’t change the analysis in any meaningful way, and makes for easier exposition since we do not need to distinguish when has been transposed in Algorithm 4.
As stated above, the run time of the algorithm is dominated by the matrix multiplication , which must be performed for every where on the current iteration. On the th outer iteration of the algorithm, and will have columns each; therefore, , so the matrix multiplication takes time . We need to perform of these per inner iteration. Thus, if the algorithm runs for a total of outer loop iterations each of which takes innerloop iterations, it will take time
In practice, the number of inner iterations per outer iteration is usually very small, say, 3. As an example, if the algorithm correctly factors a matrix of the structure in Example 1, the th outer loop iteration will find a block of size . Thus each inner loop iteration just takes time , and there will be iterations, so that the whole algorithm runs in time .
In most extensiveform games, the payoff matrix is block diagonal. In this case, running the factorization algorithm on is equivalent to running it on each of the blocks individually, and has the same run time as running the algorithm on each block separately. Indeed, if is initialized to a random basis vector , the algorithm’s entire run, and all its operations—including the critical matrix multiplication —will not escape the block to which row of matrix belongs. Thus, for example, running the algorithm on a matrix with blocks of size , each of which has the structure of Example 1, will still take time .
6 Experiments
Game  Gap  fnnz  Simplex  Barrier  LPsparse’  CFR  

9card Leduc poker  .0001  5,798  30,924  13,712  .5  .08  7  901 
13card Leduc poker  .0001  12,014  95,056  31,522  2.4  .24  14  1,823 
5x2 battleship m=2 n=1  .0001  230,778  33,124  —  8.7  .44  5  2,451 
4x3 battleship m=2 n=1  .0001  639,984  82,076  —  81.0  1.47  14  4,059 
3x2 battleship m=4 n=1  .0001  3,236,158  1,201,284  —  (T)  16.90  659  86,284 
3x2 battleship m=3 n=2  .0001  1,658,904  3,345,408  —  (T)  20.22  202  55,040 
sheriff N=10000 B=100  .0001  1,020,306  2,020,101  —  3.0  52.56  12  7,912 
sheriff N=1000 B=1000  .0001  1,005,006  2,003,501  —  2.8  208.35  9  1,728 
sheriff N=100 B=10000  .0001  1,030,206  2,020,151  —  5.2  66.71  19  287 
4rank goofspiel  .0001  42,478  11,136  —  .7  .39  42  51,857 
5rank goofspiel  1.74  5,332,052  1,574,400  —  (T)  267.46  7,200  1,081 
NLH river endgame A  .00684  129,222  53,585,621  481,967  294.9  (T)  7,200  11,893 
NLH river endgame B  .00178  61,062  25,240,149  229,454  54.4  (T)  7,200  3,350 
are the wallclock times, in seconds, that those four algorithms took to achieve the desired Nash gap. All times greater than 2 hours (7200 seconds) are estimated via linear regression on the loglog convergence plot. Gurobi was timelimited to half an hour (1800 seconds) because each game had at least one method that solved the game well within this limit. Since it is difficult to estimate the convergence rate of Gurobi’s solver, Gurobi timeouts are simply indicated with a (T).
Factored  PokerSpecific  Unfactored  

Endgame  Starting pot (bb)  Simplex  Barrier  DCFR  Simplex  
River 1  time (s)  
factor nnz:  memory (MB)  
factor time s, memory MB  Nash gap (bb)  
River 2  time (s)  
factor nnz:  memory (MB)  
factor time s, memory MB  Nash gap (bb)  
River 3  time (s)  timeout  
factor nnz:  memory (MB)  na  
factor time s, memory MB  Nash gap (bb)  na  
River 4  time (s)  
factor nnz:  memory (MB)  
factor time s, memory MB  Nash gap (bb)  
River 5  time (s)  
factor nnz:  memory (MB)  
factor time s, memory MB  Nash gap (bb)  
River 6  time (s)  
factor nnz:  memory (MB)  
factor time s, memory MB  Nash gap (bb)  
River 7  time (s)  
factor nnz:  memory (MB)  
factor time s, memory MB  Nash gap (bb)  
River 8  time (s)  
factor nnz:  memory (MB)  
factor time s, memory MB  Nash gap (bb)  
Small Turn  time (s)  timeout  
factor nnz:  memory (MB)  na  
factor time s, memory MB  Nash gap (bb)  na 
We compared stateoftheart commercial implementations (Gurobi Optimization, 2019) of the common LP solving algorithms (simplex, dual simplex, and barrier) and our modified version of the LPsparse algorithm (Yen et al., 2015) (which we call LPsparse’), combined with our factorization algorithm, to the newest, fastest variants of CFR (Brown & Sandholm, 2019).
6.1 Experiments with All Solvers
In the first set of experiments, we studied the setting where the payoff matrix is given explicitly. In this setting, the factorization algorithm can be allowed to modify , and CFR variants must load the whole matrix into memory. In this experiment, we use the gameindependent CFR implementation built in the Rust programming language for speed. In each game, the largest entry of the payoff matrix in absolute value, that is, , was normalized to be . We ran LPsparse’ four times on each game; in particular, for each combination of (i) which player is chosen to be player in (2), in other words, whether (2) is solved via the primal or dual; and (ii) choice of inner iteration algorithm (RC or PG). We tested four different variants of CFR: DCFR[] (“CFR+”), DCFR[] (“CFR+ with quadratic averaging”), DCFR[] (“DCFR”), DCFR[] (“LCFR”). These variants are introduced and analyzed in depth by Brown & Sandholm (2019) and represent the current state of the art in largescale game solving. The best of those variants for each game is shown in Table 1. We ran LPsparse’ and CFR to target precision (Nash gap) , or for hours, whichever threshold was hit first. We ran primal and dual simplex to optimality (machine precision), and barrier with default settings except crossover off. We ran all solvers on a single core. The games that we tested on are standard benchmarks; they are described in the appendix.
On most games, all LP solvers outperformed CFR. This marks, to our knowledge, the first time that LP (or, indeed, any fundamentally different algorithm) has been shown to be competitive against leading CFR variants on large games.
The matrix factorization algorithm performs remarkably well in practice when it needed to. On 9card and 13card Leduc poker, it led to a compression ratio of 23. More impressively, the algorithm compresses both nolimit endgames by a factor of more than 100. This brings savings of nearly the same factor in convergence rate in both games, and enables the LP algorithms to be competitive against the CFR variants in these large games. On payoff matrices that are already sparse, the factorization algorithm fails to find a sparse factorization, and terminates immediately.
On a few games, the choice of which player to make the player in LP (2); that is, the choice between primal and dual solves, made a significant difference. For example, in the sheriff family of games, setting to be the smuggler yields much better results. This is because the optimal strategy in the sheriff games is very sparse for the smuggler. Indeed, Yen et al. (2015) make note of the fact that their algorithm performs significantly better when the optimal primal solution is sparse, since in this case the inner loop does not need to loop over the entire constraint matrix .
6.2 Experiments on Nolimit Texas Hold’em Endgames
In the experiment described above, Gurobi’s LP solvers consistently outperformed LPsparse’ despite the theoretical guarantees of the latter. Thus, in the second set of experiments, we focus on Gurobi and DCFR.
The implicit implementation of our factorization algorithm (Section 5.2) allows us to scale our method to larger games than previously possible. We hence ran experiments testing this implementation on headsup nolimit Texas Hold’em poker endgames encountered by the superhuman agent Libratus (Brown & Sandholm, 2017). To align with Brown & Sandholm (2019), we used a simple action abstraction: the bets are halfpot, fullpot, and allin, and the raises are fullpot and allin. All results are expressed in terms of the standard metric, namely big blinds (bb). The starting stacks are 200 big blinds per player as in the Libratus match against humans. We tested on eight real river endgames (i.e., endgames that start on the fourth betting round) and a single manuallygenerated small turn endgame (i.e., endgames that start on the third betting round) where the pot already has half of the players’ wealth, so only a single additional bet or raise is possible.
In this experiment we used an optimized pokerspecific C++ implementation of DCFR. This implementation includes optimizations such as those of Johanson et al. (2011), which shave an factor off the runtime of CFR, where in the case of Texas hold’em poker, is the number of possible hands a player may have, and Brown & Sandholm (2015), which prune game lines that are dynamically determined not to be part of the optimal solution. For the LP solver, we use Gurobi’s simplex and barrier methods. Both primal and dual simplex were run, and only the better of the two results is shown in Table 2. We also tested Gurobi without the factorization algorithm. In this case, we do not include results for the barrier method, because it timed out or ran out of memory in all the cases. All algorithms were again restricted to a single core. DCFR was run for the amount of time taken by the fastest LP variant that used factoring. For example, if Gurobi took seconds to solve a game, and the factorization algorithm took seconds, CFR was run for seconds. The results are in Table 2 and representative convergence plots showing anytime performance are in the appendix.
The factorization algorithm reduced the size of the game by a factor of 52–80 and the resulting payoff matrix had density (i.e., nonzeros divided by rows plus columns) 7.8–9.5. This is expected: poker payoff matrices are block diagonal, where the blocks are and rank one with the lowertriangular half negated. Thus, they basically have the structure of Example 1, in which we saw a compression from density to density , which is exactly the compression we are seeing here.
The DCFR implementation we tested against is especially optimized to solve nolimit turn endgames. It thus may have some inefficiencies when handling river endgames. We estimate that these inefficiencies lose a factor of approximately 20 in time and space on river endgames relative to a riveroptimized implementation. However, importantly, these inefficiencies pale in comparison to the speedups gained by gamespecific poker speedups (e.g., Johanson et al., 2011), which save a factor of approximately in time (but not space). This strongly suggests that our method would be significantly faster than any nongamespecific implementation of CFR or any modern variant. Furthermore, the memory usage of simplex, after factorization, is only a factor of worse than the gamespecific CFR (which stores the constraint matrix implicitly) in the case of all these endgames, which means it is often practical to use LP solvers even on extremely large games with dense payoff matrices, as long as the constraint matrix is factorable.
Since primal simplex and dual simplex give respectively only primalfeasible and dualfeasible solutions, anytime performance of simplex is measured by running both simultaneously, and measuring the Nash gap between the current primal and dual solutions at each time checkpoint, using Gurobi’s reported objective values. While Gurobi does not allow retrieval of these anytime solutions when its presolver is turned on, in principle they can be retrieved easily using the presolver’s mapping, which unfortunately Gurobi does not expose to the end user. The convergence plots in Figure 2 show roughly what we would expect. CFR has a very stable convergence curve (until it hits too high precision, at which point numerical stability issues start kicking in, and the convergence plot looks weird). The LP solvers start out slow (especially due to the sometimes nontrivial time requirements of the factorization algorithm) but catch up with and often eventually exceed the performance of DCFR, before again very often stopping due to numerical issues. Even on turn endgames, LP algorithms consistently outperform a hypothetical nongamespecific implementation of DCFR—which we define to be 500 times slower than the pokerspecific DCFR—due to the additional factor of in the density of the payoff matrix, and hence the additional cost of the gradient computation in DCFR.
7 Conclusion and Future Research
We presented a matrix factorization algorithm that yields significant reduction in sparsity. We showed how the factored matrix can be used in an LP to solve zerosum games. This reduces both the time and space needed by LP solvers. On explicitly represented games, this significantly outperforms the prior stateoftheart algorithm, DCFR. It also made LP solvers competitive on large games that are implicitly defined by a compact set of rules—even against an optimized gamespecific DCFR implementation. There are many interesting directions for future research, such as (1) further improving the factorization algorithm, (2) investigating the explicit form of an optimal factorization in special cases, and (3) parallelizing the factorization algorithm.
Acknowledgements
This material is based on work supported by the National Science Foundation under grants IIS1718457, IIS1617590, IIS1901403, and CCF1733556, and the ARO under awards W911NF1710082 and W911NF2010081.
References
 Bowling et al. (2015) Bowling, M., Burch, N., Johanson, M., and Tammelin, O. Headsup limit hold’em poker is solved. Science, 347(6218), January 2015.
 Brown & Sandholm (2015) Brown, N. and Sandholm, T. Regretbased pruning in extensiveform games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2015.
 Brown & Sandholm (2017) Brown, N. and Sandholm, T. Superhuman AI for headsup nolimit poker: Libratus beats top professionals. Science, pp. eaao1733, Dec. 2017. Print version 359(6374):418–424, 2018.

Brown & Sandholm (2019)
Brown, N. and Sandholm, T.
Solving imperfectinformation games via discounted regret
minimization.
In
AAAI Conference on Artificial Intelligence (AAAI)
, 2019.  Farina et al. (2019) Farina, G., Ling, C. K., Fang, F., and Sandholm, T. Correlation in extensiveform games: Saddlepoint formulation and benchmarks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2019.
 Gillis & Vavasis (2018) Gillis, N. and Vavasis, S. A. On the complexity of robust PCA and norm lowrank matrix approximation. Mathematics of Operations Research, 43(4):1072–1084, 2018.
 Gilpin et al. (2012) Gilpin, A., Peña, J., and Sandholm, T. Firstorder algorithm with convergence for equilibrium in twoperson zerosum games. Mathematical Programming, 133(1–2):279–298, 2012. Conference version appeared in AAAI08.
 Golub & Van Loan (1996) Golub, G. H. and Van Loan, C. F. Matrix Computations. Johns Hopkins University Press, 1996.
 Gurobi Optimization (2019) Gurobi Optimization, L. Gurobi optimizer reference manual, 2019.
 Hoda et al. (2010) Hoda, S., Gilpin, A., Peña, J., and Sandholm, T. Smoothing techniques for computing Nash equilibria of sequential games. Mathematics of Operations Research, 35(2), 2010.
 Johanson et al. (2011) Johanson, M., Waugh, K., Bowling, M., and Zinkevich, M. Accelerating best response calculation in large extensive games. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2011.

Koller et al. (1994)
Koller, D., Megiddo, N., and von Stengel, B.
Fast algorithms for finding randomized strategies in game trees.
In Proceedings of the 26^{th}
ACM Symposium on Theory of Computing (STOC)
, 1994.  Kroer et al. (2015) Kroer, C., Waugh, K., KılınçKarzan, F., and Sandholm, T. Faster firstorder methods for extensiveform game solving. In Proceedings of the ACM Conference on Economics and Computation (EC), 2015.
 Kroer et al. (2018) Kroer, C., Farina, G., and Sandholm, T. Solving large sequential games with the excessive gap technique. In Conference on Neural Information Processing Systems (NIPS), 2018.
 Meng & Xu (2012) Meng, D. and Xu, Z. Divideandconquer method for L1 norm matrix factorization in the presence of outliers and missing data. arXiv, abs/1202.5844, 2012.
 Neyshabur & Panigrahy (2013) Neyshabur, B. and Panigrahy, R. Sparse matrix factorization. arXiv, abs/1311.3315, 2013.
 Richard et al. (2014) Richard, E., Obozinski, G., and Vert, J. Tight convex relaxations for sparse matrix factorization. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2014.
 Southey et al. (2005) Southey, F., Bowling, M., Larson, B., Piccione, C., Burch, N., Billings, D., and Rayner, C. Bayes’ bluff: Opponent modelling in poker. In Proceedings of the 21st Annual Conference on Uncertainty in Artificial Intelligence (UAI), July 2005.
 Tammelin (2014) Tammelin, O. Solving large imperfect information games using CFR+. CoRR, abs/1407.5042, 2014.
 von Stengel (1996) von Stengel, B. Efficient computation of behavior strategies. Games and Economic Behavior, 14(2):220–246, 1996.
 Yen et al. (2015) Yen, I. E., Zhong, K., Hsieh, C., Ravikumar, P., and Dhillon, I. S. Sparse linear programming via primal and dual augmented coordinate descent. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), pp. 2368–2376, 2015.
 Zinkevich et al. (2007) Zinkevich, M., Bowling, M., Johanson, M., and Piccione, C. Regret minimization in games with incomplete information. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2007.

Zou & Xue (2018)
Zou, H. and Xue, L.
A selective overview of sparse principal component analysis.
Proceedings of the IEEE, 106(8):1311–1320, 2018.
Appendix A Proof of Theorem 2
The key to the proof is to bound how much this naive normalization changes the point . Let be the result of projecting into the optimal set .
Lemma.
Let be the result of normalizing according to the given scheme, and be an information set at depth (with the root defined to be at depth . Then we have
Proof.
By induction on the sequenceform strategy tree for player , starting at the root. At the root node , the claim is clearly true because in any feasible solution . Now consider any information set with parent and children at depth . From the theorem statement, we have , and since is feasible, we have . It follows that
by triangle inequality and inductive hypothesis, and noting that . But the normalization acts by picking so that , and it moves all the s in the same direction; thus, each one can move by at most , completing the induction. ∎
With this lemma in hand, we now prove the theorem.
Proof of Theorem.
Since (each depth must have at least one information set), it follows from the lemma that . But the best response function (with feasibility constraints on ) is a pointwise minimum of Lipschitz functions for each and feasible, hence itself Lipschitz, with Lipschitz constant
where is the sum of the magnitudes of the nonzero entries of . The desired theorem follows. ∎
Appendix B Another Example of the Utility of Sparse Factorization
Example 3.
Let be the matrix given by , where is the identity, and is a column vector of zeros. So, is the matrix whose entry is exactly when or . By direct computation, the SVD of this matrix is where and are fully dense, and the SVD is unique (in the usual sense, that is, up to signs and permutations) since all the singular values are. Thus, taking an SVD would have the result of increasing the number of nonzeros from to , which is the opposite of what we want. Thus, although in this case there will not be a good sparse factorization, using SVD make the problem worse.
Appendix C Benchmark Games in Experiment 1
We tested on the following benchmark games from the literature:

Leduc poker (Southey et al., 2005) is a small variant of poker, played with one hole card and three community cards.

Battleship (Farina et al., 2019) is the classic targeting game, with two parameters: is the number of moves (shots) a player may take, and is the number of ships on the board. All ships have length 2. A player scores a point only for sinking a full ship.

Sheriff (Farina et al., 2019) is a simplified Sheriff of Nottingham game, modified to be zerosum, played between a smuggler and a sheriff. The smuggler selects a bribe amount and a number of illegal items to try to smuggle past the sheriff. The sheriff then decides whether to inspect. If the sheriff does not inspect the cargo, then the smuggler scores . If the sheriff inspects and finds no illegal items (), then the smuggler scores . If the sheriff inspects, and , then the smuggler scores . The smuggler has far more sequences than the sheriff in this game.

Nolimit holdem (NLH) river endgames are endgames encountered by the pokerplaying agent Libratus (Brown & Sandholm, 2017), using the action abstraction used by Libratus. They both begin on the last betting round, when all five community cards are known. The normalization of means that in these endgames, a Nash gap of 1 corresponds to 0.075 big blinds. Due to the explicit storage of the payoff matrix in this experiment, only extremely small nolimit endgames can be tested. In particular, endgame A here is the same as endgame 7 in the next experiment (with a finer abstraction), and endgame B is the same as endgame A except with the starting pot size doubled to make the game smaller.