# Parameterized Low-Rank Binary Matrix Approximation

We provide a number of algorithmic results for the following family of problems: For a given binary m× n matrix A and integer k, decide whether there is a "simple" binary matrix B which differs from A in at most k entries. For an integer r, the "simplicity" of B is characterized as follows. - Binary r-Means: Matrix B has at most r different columns. This problem is known to be NP-complete already for r=2. We show that the problem is solvable in time 2^O(k k)·(nm)^O(1) and thus is fixed-parameter tractable parameterized by k. We prove that the problem admits a polynomial kernel when parameterized by r and k but it has no polynomial kernel when parameterized by k only unless NP⊆ coNP/poly. We also complement these result by showing that when being parameterized by r and k, the problem admits an algorithm of running time 2^O(r·√(k(k+r)))(nm)^O(1), which is subexponential in k for r∈ O(k^1/2 -ϵ) for any ϵ>0. - Low GF(2)-Rank Approximation: Matrix B is of GF(2)-rank at most r. This problem is known to be NP-complete already for r=1. It also known to be W[1]-hard when parameterized by k. Interestingly, when parameterized by r and k, the problem is not only fixed-parameter tractable, but it is solvable in time 2^O(r^ 3/2·√(kk))(nm)^O(1), which is subexponential in k. - Low Boolean-Rank Approximation: Matrix B is of Boolean rank at most r. The problem is known to be NP-complete for k=0 as well as for r=1. We show that it is solvable in subexponential in k time 2^O(r2^r·√(k k))(nm)^O(1).

## Authors

• 37 publications
• 36 publications
• 18 publications
• ### Covering Vectors by Spaces in Perturbed Graphic Matroids and Their Duals

Perturbed graphic matroids are binary matroids that can be obtained from...
02/19/2019 ∙ by Fedor V. Fomin, et al. ∙ 0

• ### Approximation Schemes for Low-Rank Binary Matrix Approximation Problems

We provide a randomized linear time approximation scheme for a generic p...
07/18/2018 ∙ by Fedor V. Fomin, et al. ∙ 0

• ### On Clustering Incomplete Data

We study fundamental clustering problems for incomplete data. In this se...
11/04/2019 ∙ by Eduard Eiben, et al. ∙ 0

• ### Kernelization of Whitney Switches

A fundamental theorem of Whitney from 1933 asserts that 2-connected grap...
06/24/2020 ∙ by Fedor V. Fomin, et al. ∙ 0

• ### A row-invariant parameterized algorithm for integer programming

A long line of research on fixed parameter tractability of integer progr...
07/15/2019 ∙ by Martin Koutecký, et al. ∙ 0

• ### A Parameterized Perspective on Attacking and Defending Elections

We consider the problem of protecting and manipulating elections by reco...
05/06/2020 ∙ by Kishen N. Gowda, et al. ∙ 0

• ### Trichotomy for the reconfiguration problem of integer linear systems

In this paper, we consider the reconfiguration problem of integer linear...
11/07/2019 ∙ by Kei Kimura, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this paper we consider the following generic problem. Given a binary matrix, that is a matrix with entries from domain ,

 A=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝a11a12…a1na21a21…a2n⋮⋮⋱⋮am1am2…amn⎞⎟ ⎟ ⎟ ⎟ ⎟⎠=(aij)∈{0,1}m×n,

the task is to find a “simple” binary matrix which approximates subject to some specified constrains. One of the most widely studied error measures is the Frobenius norm, which for a matrix is defined as

 ∥A∥F= ⎷m∑i=1n∑j=1|aij|2.

Here the sums are taken over . Then for a given nonnegative integer , we want to decide whether there is a matrix with certain properties such that

 ∥A−B∥2F≤k.

We consider the binary matrix approximation problems when for a given integer , the approximation binary matrix

• has at most pairwise-distinct columns,

• is of GF-rank at most , and

• is of Boolean rank at most .

Each of these variants is very well-studied. Before defining each of the problems formally and providing an overview of the relevant results, the following observation is in order. Since we approximate a binary matrix by a binary matrix, in this case minimizing the Frobenius norm of is equivalent to minimizing the -norm of , where the measure is the number of non-zero entries of matrix . We also will be using another equivalent way of measuring the quality of approximation of a binary matrix by a binary matrix by taking the sum of the Hamming distances between their columns. Let us recall that the Hamming distance

between two vectors

, where and , is or, in words, the number of positions where and differ. Then for binary matrix with columns and matrix with columns , we define

 dH(A,B)=n∑i=1dH(ai,bi).

In other words, is the number of positions with different entries in matrices and . Then we have the following.

 ∥A−B∥2F=∥A−B∥0=dH(A,B)=n∑i=1dH(ai,bi). (1)

Problem (A1): Binary -Means. By (1), the problem of approximating a binary matrix by a binary matrix with at most different columns (problem (A1)) is equivalent to the following clustering problem. For given a set of binary -dimensional vectors (which constitute the columns of matrix ) and a positive integer , Binary -Means aims to partition the vectors in at most clusters, as to minimize the sum of within-clusters sums of Hamming distances to their binary means. More formally,

To see the equivalence of Binary -Means and problem (A1), it is sufficient to observe that the pairwise different columns of an approximate matrix such that can be used as vectors , . As far as the mean vectors are selected, a partition of columns of can be obtained by assigning each column-vector to its closest mean vector (ties breaking arbitrarily). Then for such clustering the total sum of distances from vectors within cluster to their centers does not exceed . Similarly, solution to Binary -Means can be used as columns (with possible repetitions) of matrix such that . For that we put , where is the closest vector to .

This problem was introduced by Kleinberg, Papadimitriou, and Raghavan [39] as one of the examples of segmentation problems. Approximation algorithms for optimization versions of this problem were given by Alon and Sudakov [3] and Ostrovsky and Rabani [58], who referred to it as clustering in the Hamming cube. In bioinformatics, the case when is known under the name Binary-Constructive-MEC (Minimum Error Correction) and was studied as a model for the Single Individual Haplotyping problem [15]. Miettinen et al. [51] studied this problem under the name Discrete Basis Partitioning Problem.

Binary -Means can be seen as a discrete variant of the well-known -Means Clustering. (Since in problems (A2) and (A3) we use for the rank of the approximation matrix, we also use in (A1) to denote the number of clusters which is commonly denoted by

in the literature on means clustering.) This problem has been studied thoroughly, particularly in the areas of computational geometry and machine learning. We refer to

[1, 6, 41] for further references to the works on -Means Clustering.

Problem (A2): Low GF(2)-Rank Approximation. Let be a binary matrix. In this case we view the elements of as elements of GF, the Galois field of two elements. Then the GF-rank of is the minimum such that , where is and is binary matrices, and arithmetic operations are over GF. Equivalently, this is the minimum number of binary vectors, such that every column (row) of is a linear combination (over GF) of these vectors. Then (A2) is the following problem.

Low GF(2)-Rank Approximation

arises naturally in applications involving binary data sets and serves as an important tool in dimension reduction for high-dimensional data sets with binary attributes, see

[19, 37, 33, 40, 59, 62, 69] for further references and numerous applications of the problem.

Low GF(2)-Rank Approximation can be rephrased as a special variant (over GF) of the problem finding the rigidity of a matrix. (For a target rank , the rigidity of a matrix over a field is the minimum Hamming distance between and a matrix of rank at most .) Rigidity is a classical concept in Computational Complexity Theory studied due to its connections with lower bounds for arithmetic circuits [31, 32, 65, 60]. We refer to [43] for an extensive survey on this topic.

Low GF(2)-Rank Approximation is also a special case of a general class of problems approximating a matrix by a matrix with a small non-negative rank. Already Non-negative Matrix Factorization (NMF) is a nontrivial problem and it appears in many settings. In particular, in machine learning, approximation by a low non-negative rank matrix has gained extreme popularity after the influential article in Nature by Lee and Seung [42]

. NMF is an ubiquitous problem and besides machine learning, it has been independently introduced and studied in combinatorial optimization

[23, 68], and communication complexity [2, 44]. An extended overview of applications of NMF in statistics, quantum mechanics, biology, economics, and chemometrics, can be found in the work of Cohen and Rothblum [17] and recent books [14, 55, 26].

Problem (A3): Low Boolean-Rank Approximation. Let be a binary matrix. This time we view the elements of as Boolean variables. The Boolean rank of is the minimum such that for a Boolean matrix and a Boolean matrix , where the product is Boolean, that is, the logical plays the role of multiplication and the role of sum. Here , , , , , and . Thus the matrix product is over the Boolean semi-ring . This can be equivalently expressed as the normal matrix product with addition defined as . Binary matrices equipped with such algebra are called Boolean matrices. Equivalently, has the Boolean rank if , where and are nonzero vectors and the product is Boolean, that is, . Then the Boolean rank of is the minimum integer such that , where

are matrices of Boolean rank 1; zero matrix is the unique matrix with the Boolean rank

. Then Low Boolean-Rank Approximation is defined as follows.

For Low Boolean-Rank Approximation coincides with Low GF(2)-Rank Approximation but for these are different problems.

Boolean low-rank approximation has attracted much attention, especially in the data mining and knowledge discovery communities. In data mining, matrix decompositions are often used to produce concise representations of data. Since much of the real data such as word-document data is binary or even Boolean in nature, Boolean low-rank approximation could provide a deeper insight into the semantics associated with the original matrix. There is a big body of work done on Low Boolean-Rank Approximation, see e.g. [7, 9, 19, 46, 51, 52, 63]. In the literature the problem appears under different names like Discrete Basis Problem [51] or Minimal Noise Role Mining Problem [64, 46, 53].

-Matrix Approximation. While at first glance Low GF(2)-Rank Approximation and Low Boolean-Rank Approximation look very similar, algorithmically the latter problem is more challenging. The fact that GF is a field allows to play with different equivalent definitions of rank like row rank and column ranks. We exploit this strongly in our algorithm for Low GF(2)-Rank Approximation. For Low Boolean-Rank Approximation the matrix product is over the Boolean semi-ring and nice properties of the GF- cannot be used here (see, e.g. [34]). Our algorithm for Low Boolean-Rank Approximation is based on solving an auxiliary -Matrix Approximation problem, where the task is to approximate a matrix by a matrix whose block structure is defined by a given pattern matrix . It appears, that -Matrix Approximation is also an interesting problem on its own.

More formally, let be a binary matrix. We say that a binary matrix is a -matrix if there is a partition of and a partition of such that for every , , and , . In words, the columns and rows of can be permuted such that the block structure of the resulting matrix is defined by .

The notion of -matrix was implicitly defined by Wulff et al. [67] as an auxiliary tool for their approximation algorithm for the related monochromatic biclustering problem. -Matrix Approximation is also closely related to the problems arising in tiling transaction databases (i.e., binary matrices), where the task is to find a tiling covers of a given binary matrix with a small number of submatrices full of 1s, see [27].

Since Low GF(2)-Rank Approximation remains -complete for  [28], we have that -Matrix Approximation is -complete already for very simple pattern matrix

### 1.1 Related work

In this subsection we give an overview of previous related algorithmic and complexity results for problems (A1)–(A3), as well as related problems. Since each of the problems has many practical applications, there is a tremendous amount of literature on heuristics and implementations. In this overview we concentrate on known results about algorithms with proven guarantee, with emphasis on parameterized complexity.

Problem (A1): Binary -Means. Binary -Means is trivially solvable in polynomial time for , and as was shown by Feige in [22], is -complete for every .

PTAS (polynomial time approximation scheme) for optimization variants of Binary -Means were developed in [3, 58]. Approximation algorithms for more general -Means Clustering is a thoroughly studied topic [1, 6, 41]. Inaba et al. [35] have shown that the general -Means Clustering is solvable in time (here is the number of vectors, is the dimension and the number of required clusters). We are not aware of any, except the trivial brute-force, exact algorithm for Binary -Means prior to our work.

Problem (A2): Low GF(2)-Rank Approximation. When low-rank approximation matrix is not required to be binary, then the optimal Frobenius norm rank- approximation of (not necessarily binary) matrix

can be efficiently found via the singular value decomposition (SVD). This is an extremely well-studied problem and we refer to surveys for an overview of algorithms for low rank approximation

[38, 47, 66]. However, SVD does not guarantee to find an optimal solution in the case when additional structural constrains on the low-rank approximation matrix (like being non-negative or binary) are imposed.

In fact, most of these constrained variants of low-rank approximation are NP-hard. In particular, Gillis and Vavasis  [28] and Dan et al. [19] have shown that Low GF(2)-Rank Approximation is -complete for every . Approximation algorithms for the optimization version of Low Boolean-Rank Approximation were considered in [36, 37, 19, 40, 62, 12] among others.

Most of the known results about the parameterized complexity of the problem follows from the results for Matrix Rigidity. Fomin et al. have proved in [25] that for every finite field, and in particular GF, Matrix Rigidity over a finite field is -hard being parameterized by . This implies that Low GF(2)-Rank Approximation is -hard when parameterized by . However, when parameterized by and , the problem becomes fixed-parameter tractable. For Low GF(2)-Rank Approximation, the algorithm from [25] runs in time , where is some function of . While the function is not specified in [25], the algorithm in [25] invokes enumeration of all binary matrices of rank , and thus the running time is at least double-exponential in .

Meesum, Misra, and Saurabh [49], and Meesum and Saurabh [50] considered parameterized algorithms for related problems about editing of the adjacencies of a graph (or directed graph) targeting a graph with adjacency matrix of small rank.

Problem (A3): Low Boolean-Rank Approximation. It follows from the rank definitions that a matrix is of Boolean rank if and only if its GF-rank is . Thus by the results of Gillis and Vavasis  [28] and Dan et al. [19] Low Boolean-Rank Approximation is -complete already for . Lu et al. [45] gave a formulation of Low Boolean-Rank Approximation as an integer programming problem with exponential number of variables and constraints.

While computing GF-rank (or rank over any other field) of a matrix can be performed in polynomial time, deciding whether the Boolean rank of a given matrix is at most is already an -complete problem. Thus Low Boolean-Rank Approximation is -complete already for . This follows from the well-known relation between the Boolean rank and covering edges of a bipartite graph by bicliques [30]. Let us briefly describe this equivalence. For Boolean matrix , let be the corresponding bipartite graph, i.e. the bipartite graph whose biadjacency matrix is . By the equivalent definition of the Boolean rank, has Boolean rank if and only if it is the logical disjunction of Boolean matrices of rank . But for every bipartite graph whose biadjacency matrix is a Boolean matrix of rank at most , its edges can be covered by at most one biclique (complete bipartite graph). Thus deciding whether a matrix is of Boolean rank is exactly the same as deciding whether edges of a bipartite graph can be covered by at most bicliques. The latter Biclique Cover problem is known to be -complete [57]. Biclique Cover is solvable in time  [29] and unless Exponential Time Hypothesis (ETH) fails, it cannot be solved in time [13].

For the special case Low Boolean-Rank Approximation and , Bringmann, Kolev and Woodruff gave an exact algorithm of running time [12]. (Let us remind that the -norm of a matrix is the number of its non-zero entries.) More generally, exact algorithms for NMF were studied by Cohen and Rothblum in [17]. Arora et al. [5] and Moitra [54], who showed that for a fixed value of , NMF is solvable in polynomial time. Related are also the works of Razenshteyn et al. [61] on weighted low-rank approximation, Clarkson and Woodruff [16] on robust subspace approximation, and Basu et al.  [8] on PSD factorization.

Observe that all the problems studied in this paper could be seen as matrix editing problems. For Binary -Means, we can assume that as otherwise we have a trivial NO-instance. Then the problem asks whether it is possible to edit at most entries of the input matrix, that is, replace some s by s and some s by s, in such a way that the obtained matrix has at most pairwise-distinct columns. Respectively, Low GF(2)-Rank Approximation asks whether it is possible to edit at most entries of the input matrix to obtain a matrix of rank at most . In -Matrix Approximation, we ask whether we can edit at most elements to obtain a -matrix. A lot of work in graph algorithms has been done on graph editing problems, in particular parameterized subexponential time algorithms were developed for a number of problems, including various cluster editing problems [21, 24].

### 1.2 Our results and methods

We study the parameterized complexity of Binary -Means, Low GF(2)-Rank Approximation and Low Boolean-Rank Approximation. We refer to the recent books of Cygan et al. [18] and Downey and Fellows [20] for the introduction to Parameterized Algorithms and Complexity. Our results are summarized in Table 1.

Our first main result concerns Binary -Means. We show (Theorem 1) that the problem is solvable in time . Therefore, Binary -Means is parameterized by . Since Low GF(2)-Rank Approximation parameterized by is -hard and Low Boolean-Rank Approximation is -complete for any fixed , we find Theorem 1 quite surprising. The proof of Theorem 1 is based on a fundamental result of Marx [48] about the complexity of a problem on strings, namely Consensus Patterns. We solve Binary -Means by constructing a two-stage FPT Turing reduction to Consensus Patterns. First, we use the color coding technique of Alon, Yuster, and Zwick from [4] to reduce Binary -Means to some special auxiliary problem and then show that this problem can be reduced to Consensus Patterns, and this allows us to apply the algorithm of Marx [48]. We also prove (Theorem 2) that Binary -Means admits a polynomial kernel when parameterized by and . That is, we give a polynomial time algorithm that for a given instance of Binary -Means outputs an equivalent instance with columns and rows. For parameterization by only, we show in Theorem 4 that Binary -Means has no polynomial kernel unless , the standard complexity assumption.

Our second main result concerns Low Boolean-Rank Approximation. As we mentioned above, the problem is NP-complete for , as well as for for , and hence is intractable being parameterized by or by only. On the other hand, a simpler Low GF(2)-Rank Approximation is not only parameterized by , by [25] it is solvable in time , where is some function of , and thus is subexponential in . It is natural to ask whether a similar complexity behavior could be expected for Low Boolean-Rank Approximation. Our second main result, Theorem 8, shows that this is indeed the case: Low Boolean-Rank Approximation is solvable in time . The proof of this theorem is technical and consists of several steps. We first develop a subexponential algorithm for solving auxiliary -Matrix Approximation, and then construct an FPT Turing reduction from Low Boolean-Rank Approximation to -Matrix Approximation.

Let us note that due to the relation of Boolean rank computation to Biclique Cover, the result of [13] implies that unless Exponential Time Hypothesis (ETH) fails, Low Boolean-Rank Approximation cannot be solved in time for any function . Thus the dependence in in our algorithm cannot be improved significantly unless ETH fails.

Interestingly, the technique developed for solving -Matrix Approximation can be used to obtain algorithms of running times for Binary -Means and for Low GF(2)-Rank Approximation (Theorems 5 and 6 respectively). For Binary -Means, Theorems 5 provides much better running time than Theorem 1 for values of .

For Low GF(2)-Rank Approximation, comparing Theorem 6 and the running time from [25], let us note that Theorem 6 not only slightly improves the exponential dependence in by ; it also drastically improves the exponential dependence in , from to .

The remaining part of the paper is organized as follows. In Section 2 we introduce basic notations and obtain some auxiliary results. In Section 3 we show that Binary -Means is when parameterized by only. In Section 4 we discuss kernelization for Binary -Means. In Section 5 we construct algorithms for Binary -Means and Low GF(2)-Rank Approximation parameterized by and that are subexponential in . In Section 6 we give a subexponential algorithm for Low Boolean-Rank Approximation. We conclude our paper is Section 7 by stating some open problems.

## 2 Preliminaries

In this section we introduce the terminology used throughout the paper and obtain some properties of the solutions to our problems.

All matrices and vectors considered in this paper are assumed to be -matrices and vectors respectively unless explicitly specified otherwise. Let be an -matrix. Thus , and , are the elements of . For and , we denote by the -submatrix of with the elements where and . We say that two matrices and are isomorphic if can be obtained from by permutations of rows and columns. We use “” and “” to denote sums and summations over , and we use “” and “” for sums and summations over GF.

We also consider string of symbols. For two strings and , we denote by their concatenation. For a positive integer , denotes the concatenation of copies of ; is assumed to be the empty string. Let be a string over an alphabet . Recall that a string is said to be a substring of if for some ; we write that in this case. Let and be strings of the same length over . Similar to the the definition of Hamming distance between two -vectors, the Hamming distance between two strings is defined as the number of position where the strings differ. We would like to mention that for Hamming distance (for vectors and strings), the triangular inequality holds. That is, for any three strings of length each, .

### 2.1 Properties of Binary r-Means

Let be an instance of Binary -Means where is a matrix with columns . We say that a partition of for is a solution for if there are vectors such that . We say that each or, equivalently, the multiset of columns (some columns could be the same) is a cluster and call the mean of the cluster. Observe that given a cluster , one can easily compute an optimal mean as follows. Let for . For each , consider the multiset and put or according to the majority of elements in , that is, if at least half of the elements in are s and otherwise. We refer to this construction of as the majority rule.

In the opposite direction, given a set of means , we can construct clusters as follows: for each column , find the closest , , such that is minimum and assign to . Note that this procedure does not guarantee that all clusters are nonempty but we can simply delete empty clusters. Hence, we can define a solution as a set of means . These arguments also imply the following observation.

###### Observation 1.

The task of Binary -Means can equivalently be stated as follows: decide whether there exist a positive integer and vectors such that .

###### Definition 1 (Initial cluster and regular partition).

Let be an -matrix with columns . An initial cluster is an inclusion maximal set such that all the columns in the multiset are equal.

We say that a partition of the columns of matrix is regular if for every initial cluster , there is such that .

By the definition of the regular partition, every initial cluster of is in some set but the set may contain many initial clusters.

###### Lemma 1.

Let be a yes-instance of Binary -Means. Then there is a solution , which is regular (i.e, for any initial cluster of , there is such that ).

###### Proof.

Let be the columns of . By Observation 1, there are vectors for some such that . Once we have the vectors , a solution can be obtained by assigning each vector to a closest vector in . This implies the conclusion of the lemma. ∎

### 2.2 Properties of Low GF(2)-Rank Approximation

For Low GF(2)-Rank Approximation, we need the following folklore observation. We provide a proof for completeness.

###### Observation 2.

Let be a matrix over GF with . Then has at most pairwise-distinct columns and at most pairwise-distinct rows.

###### Proof.

We show the claim for columns; the proof for the rows is similar in arguments to that of the case of columns. Assume that and let be a basis of the column space of . Then every column of is a linear combination of . Since is a matrix over GF, it implies that for every columns , there is such that . As the number of distinct subsets of is , the claim follows. ∎

By making use of Observation 2, we can reformulate Low GF(2)-Rank Approximation as follows: given an matrix over GF with the columns , a positive integer and a nonnegative integer , we ask whether there is a positive integer , a partition of and vectors such that

 r′∑i=1∑j∈IidH(ci,aj)≤k

and the dimension of the linear space spanned by the vectors is at most . Note that given a partition of , we cannot select using the majority rule like the case of Binary -Means because of the rank conditions on these vectors. But given , one can construct an optimal partition with respect to these vectors in the same way as before for Binary -Means. Similar to Observation 1, we can restate the task of Low GF(2)-Rank Approximation.

###### Observation 3.

The task of Low GF(2)-Rank Approximation of binary matrix with columns can equivalently be stated as follows: decide whether there is a positive integer

over GF such that .

Recall that it was proved by Fomin et al. [25] that Low GF(2)-Rank Approximation is when parameterized by and . To demonstrate that the total dependency on could be relatively small, we observe the following.

###### Proposition 1.

Low GF(2)-Rank Approximation is solvable in time .

###### Proof.

In what follows by rank we mean the GF-rank of a matrix. It is more convenient for this algorithm to interpret Low GF(2)-Rank Approximation as a matrix editing problem. Given a matrix over GF, a positive integer and a nonnegative integer , decide whether it is possible to obtain from a matrix with by editing at most elements, i.e., by replacing s by s and s by s. We use this to construct a recursive branching algorithm for the problem.

Let be an instance of Low GF(2)-Rank Approximation. The algorithm for works as follows.

• If , then return YES and stop.

• If , then return NO and stop.

• Since the rank of is more than , there are columns and rows such that the induced submatrix is of rank . We branch into subproblems: For each and we do the following:

• construct matrix from by replacing with ,

• call the algorithm for and

• if the algorithm returns YES, then return YES and stop.

• Return NO and stop.

To show the correctness of the algorithm, we observe the following. Let be an -matrix of rank at most . If for some and , then , i.e, and differ in at least one element. To evaluate the running time, notice that we can compute in polynomial time, and if , then we can find in polynomial time an -submatrix of of rank . Then we have branches in our algorithm. Since we decrease the parameter in every recursive call, the depth of the recurrence tree is at most . It implies that the algorithm runs in time . ∎

### 2.3 Properties of P-Matrix Approximation

We will be using the following observation which follows directly from the definition of a -matrix.

###### Observation 4.

Let be a binary matrix. Then every -matrix has at most pairwise-distinct rows and at most pairwise-distinct columns.

In our algorithm for -Matrix Approximation, we need a subroutine for checking whether a matrix is a -matrix. For that we employ the following brute-force algorithm. Let be an -matrix. Let be the rows of , and let be the columns of . Let be the partition of into inclusion-maximal sets of indices such that for every the rows for are equal. Similarly, let be the partition of into inclusion-maximal sets such that for every , the columns for are equal. We say that is the block partition of .

###### Observation 5.

There is an algorithm which given an -matrix and a -matrix , runs in time , and decides whether is a -matrix or not.

###### Proof.

Let be the block partition of and let be the block partition of . Observe that is a -matrix if and only if , and there are permutations and of and , respectively, such that the following holds for every and :

• and ,

• for , , and .

Thus in order to check whether is a -matrix, we check whether and , and if it holds, we consider all possible permutations and and verify (i) and (ii). Note that the block partitions of and can be constructed in polynomial time. Since there are and permutations of and , respectively, and (i)–(ii) can be verified in polynomial time, we obtain that the algorithm runs in time . ∎

We conclude the section by showing that -Matrix Approximation is when parameterized by and the size of .

###### Proposition 2.

-Matrix Approximation can be solved in time .

###### Proof.

As with Low GF(2)-Rank Approximation in Proposition 1, we consider -Matrix Approximation as a matrix editing problem. The task now is to obtain from the input matrix a -matrix by at most editing operations. We construct a recursive branching algorithm for this. Let be an instance of -Matrix Approximation, where and . Then the algorithm works as follows.

• Check whether is a -matrix using Observation 5. If it is so, then return YES and stop.

• If , then return NO and stop.

• Find the block partition of . Let and . Set and . For each and do the following:

• construct a matrix from by replacing the value of an arbitrary for and by the opposite value, i.e., set it if it was 0 and 0 otherwise,

• call the algorithm recursively for , and

• if the algorithm returns YES, then return YES and stop.

• Return NO and stop.

For the correctness of the algorithm, let us assume that the algorithm did not stop in the first two steps. That is, is not a -matrix and . Consider and . Let be a -matrix such that . Observe that and differ in at least one element. Hence, there is and such that for and . Note that for any choice of and , the matrices and obtained from by changing the elements and respectively, are isomorphic. This implies that is a yes-instance of -Matrix Approximation if and only if is a yes-instance for one of the branches of the algorithm.

For the running time evaluation, recall that by Observation 5, the first step can be done in time . Then the block partition of can be constructed in polynomial time and we have at most recursive calls of the algorithm in the third step. The depth of recursion is at most . Hence, we conclude that the total running time is . ∎

## 3 Binary r-Means parameterized by k

In this section we prove that Binary -Means is when parameterized by . That is we prove the following theorem.

###### Theorem 1.

Binary -Means is solvable in time .

The proof of Theorem 1 consists of two FPT Turing reductions. First we define a new auxiliary problem Cluster Selection and show how to reduce this problem the Consensus Patterns problem. Then we can use as a black box the algorithm of Marx [48] for this problem. The second reduction is from Binary -Means to Cluster Selection and is based on the color coding technique of Alon, Yuster, and Zwick from [4].

From Cluster Selection to Consensus Patterns. In the Cluster Selection problem we are given a regular partition of columns of matrix . Our task is to select from each set exactly one initial cluster such that the total deviation of all the vectors in these clusters from their mean is at most . More formally,

If is a yes-instance of Cluster Selection, then we say that the corresponding sets of initial clusters and the vector (or just as can be computed by the majority rule from the set of cluster) is a solution for the instance. We show that Cluster Selection is when parameterized by . Towards that, we use the results of Marx [48] about the Consensus Patterns problem.

Marx proved that Consensus Patterns can be solved in time where and is the total length of all the strings in the input [48]. It gives us the following lemma.

###### Lemma 2 ([48]).

Consensus Patterns can be solved in time , where is the total length of all the strings in the input if the size of is bounded by a constant.

Now we are ready to show the following result for Cluster Selection.

###### Lemma 3.

Cluster Selection can be solved in time .

###### Proof.

Let be an instance of Cluster Selection. Let be the columns of . First, we check whether there are initial clusters and a vector for some such that for and . Towards that we consider all possible choices of for . Suppose that is given. For every , we find an initial cluster such that is minimum. If , then we return the corresponding solution, i.e., the set of initial clusters and . Otherwise, we discard the choice of . It is straightforward to see that this procedure is correct and can be performed in polynomial time. Now on we assume that this is not the case. That is, if is a yes-instance, then for any solution. In particular, it means that for every solution , for . If , we obtain that is a no-instance. In this case we return the answer and stop. Hence, from now we assume that . Moreover, observe that for any solution .

We consider all -tuples of positive integers such that and for each -tuple check whether there is a solution with for . Note that there are at most such -tuples. If we find a solution for one of the -tuples, we return it and stop. If we have no solution for any -tuple, we conclude that we have a no-instance of the problem.

Assume that we are given a -tuple . If there is such that there is no initial cluster with , then we discard the current choice of the -tuple. Otherwise, we reduce the instance of the problem using the following rule: if there is and an initial cluster such that , then delete columns for from the matrix and set . By this rule, we can assume that each contains only initial clusters of size . Let where are initial clusters for .

We reduce the problem of checking the existence of a solution with for to the Consensus Patterns problem. Towards that, we first define the alphabet