Cross-identification of stellar catalogs with multiple stars: Complexity and Resolution

by   Daniel Severin, et al.
Universidad Nacional de Rosario

In this work, I present an optimization problem which consists of assigning entries of a stellar catalog to multiple entries of another stellar catalog such that the probability of such assignment is maximum. I prove that the problem is NP-Hard and show a way of modeling this problem as a maximum weighted stable set problem. A real application is solved in this way through integer programming.



There are no comments yet.


page 1

page 2

page 3

page 4


No-Rainbow Problem is NP-Hard

Surjective Constraint Satisfaction Problem (SCSP) is the problem of deci...

Maximum Absolute Determinants of Upper Hessenberg Bohemian Matrices

A matrix is called Bohemian if its entries are sampled from a finite set...

Profile-based optimal stable matchings in the Roommates problem

The stable roommates problem can admit multiple different stable matchin...

On the Parameterized Complexity of the Maximum Exposure Problem

We investigate the parameterized complexity of Maximum Exposure Problem ...

Covering a tree with rooted subtrees

We consider the multiple traveling salesman problem on a weighted tree. ...

Complexity of Combinatorial Matrix Completion With Diameter Constraints

We thoroughly study a novel and still basic combinatorial matrix complet...

Order assignment and picking station scheduling for multiple stations in KIVA warehouses

The picking efficiency of warehouses assisted by KIVA robots benefit fro...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the science of astronomy, it is common to record the position and other physical quantities of stellar objects in astronomical catalogs. They are of extreme importance for various disciplines, such as navigation, space research and geodesy. Naturally, in star catalogs, a single star has different designations according to the catalog being used that uniquely identifies it. Suppose that and are star catalogs, and , are the designations of the same star in and respectively. It is often necessary to know given . This kind of cross-identification can be performed by software tools available on Internet, such as Xmatch111 or the web-based CDS X-Match Service222

, which usually use heuristic algorithms. It was not until recently, however, that exact approaches began to be proposed. For instance, in

[1], a cross-identification problem is solved through assignment problems via the Hungarian Algorithm.

The correspondence between two catalogs does not need to be one-to-one. Some stars appearing as single ones in one catalog could correspond to multiple stars in the other. Although some catalogs, such as SAO and PPM, inform whether a certain star is double or not, available cross-matching tools do not take into account this piece of information about the star.

Consider the following cross-identification problem. Given two catalogs and covering the same region of the sky and being denser than , the problem consists of finding the “most probable” assignment such that every star is assigned up to stars of , where is the multiplicity of informed by catalog .

The original motivation to study this novel matching problem has arisen during a joint collaboration with astrophysicist Diego Sevilla [2] and whose objective has been the development of a new digital version of the Cordoba Durchmusterung, a star catalog widely used in the twentieth century.

In this work, I describe an optimization problem which I call -Matching Problem and I give a polynomial-time reduction to the Maximum Weighted Stable Set Problem (MWSSP). This reduction is further used for solving a real instance. I also present an open question concerning the forbidden subgraphs of the family of graphs that arise in that reduction and I identify two of the forbidden subgraphs. Then, I prove that the -Matching Problem is -Hard for a given .

2 Problem description and resolution

Consider two star catalogs where each star is represented as elements of a set or . Let and be the cardinality of and respectively.

For a given entry , let be the multiplicity of in the first catalog. That is, if represents a single star then , if represents a double one then , and so on. Also, let be the largest multiplicity.

The resolution of our problem is divided in two phases:

  • Phase 1: From the astrometric and photometric data available from catalogs, generate an instance of the -Matching Problem.

  • Phase 2: Reduce that instance to an instance of the MWSSP and solve it.

The first phase depends on the structure of both catalogs and involves criteria in the field of Astronomy, which can be separated from the mathematical description of the problem. For that reason, it will be discussed in an Online Appendix333 In this section, only the second phase is addressed.

During the first phase, candidates sets of stars are generated for each . For instance, the set indicates that can be assigned to , , the pair or no one (indicated by the presence of ) with positive probability. Naturally, every must satisfy . For a given star and a set , denote the event that “ corresponds to ” by and its probability by , which is computed during the first phase. Also, .

An assignment is valid when it satisfies for all , and for any such that , then , i.e. candidates of assigned to and must not share common stars. Let be the space of valid assignments. Each has a corresponding probability . We are interested in finding the most probable assignment: . Since the number of assignments is exponential, it makes little sense to perform the computation of the real probability of each one. Thus, let us make a simplification at this point by supposing the following assumption:

for all and such that , events and are independent each other.

Let . If the previous assumption holds, we would have . Although it usually does not hold, the assignment that maximizes is enough good for practical purposes. Denote for and , and let . It is easy to see that an optimal assignment can be found by minimizing , which is linear. The problem is defined as follows:

-Matching Problem
such that , ;
such that for all , for all ;
for all such that , for all .
OBJECTIVE: Obtain a valid assigment such that is minimum.

Below, I show that this problem can be polynomially transformed to the MWSSP. Recall that, given a graph and weights , MWSPP consists of finding a stable set of such that is maximum. Let be the graph such that ,

and consider weights where . Let be an optimal stable set of the MWSSP. The -Matching Problem is feasible if and only if and, in that case, for all is an optimal assignment of the -Matching Problem. If the -Matching Problem is feasible, there exists a valid assignment . Let such that if and only if . It is easy to see that is a stable set of whose weight is greater than . Since is optimal, .

Conversely, assume that and let for all . First, let us prove that is a valid assignment. Suppose that there exists such that for every . Then, which leads to a contradiction. Then, is defined for all . In addition, if then so is assigned to a unique . Furthermore, if and such that and for some then so is assigned to at most one star of . Now, let us prove that is optimal. Suppose that there exists a valid assignment such that . Again, let such that if and only if . It is easy to see that is a stable set of whose weight is . Then, , which is absurd.

Based on this reduction, an exact algorithm (which can be consulted in the Online Appendix) was implemented for solving instances of the 2-Matching Problem. Then, a real catalog of 52313 stars (where 568 are doubles) was cross-identified against another of 83397 stars in less than a minute of CPU time. The algorithm, auxiliary files and the resulting catalog are available [3].

Now, define as the family of graphs obtained by the previous reduction for any instance of the -Matching Problem. It is clearly that the 1-Matching Problem, i.e. when no multiple stars are present in catalog , can be trivially reduced to the classic Maximum Weighted Matching Problem (MWMP) over a bipartite graph . Indeed, our reduction gives the line graph of . Therefore, is the family of line graphs of bipartite graphs. It is known from Graph Theory that, if belongs to such family, then the claw, the diamond and the odd holes are forbidden induced subgraphs of . This leads to the following:

Open question. Which are the forbidden induced subgraphs that characterize those graphs from for ?

Although none of the mentioned subgraphs are forbidden for the case (they can be generated from instances of the 2-Matching Problem as it is shown in Figure 1), the claw can be generalized as follows:

Figure 1: Instances for: a) claw, b) diamond, c) odd hole

For , let . Then, is -free. Suppose that the star is an induced subgraph of . Let be the central vertex of the star and , , , the remaining vertices. W.l.o.g., we can assume that , , , , for some . If , we would obtain that and then and would be adjacent which is absurd. Therefore, . Since and are adjacent and for all , then . On the other hand, and are not adjacent for all , then . Therefore, should have at least elements which leads to a contradiction. Another forbidden subgraph of the 2-Matching Problem is given as follows. Let be the graph of Figure 2(a). Note that the instance of the 2-Matching Problem given in Figure 2(b) corresponds to the subgraph of induced by vertices . A drawback emerges when is considered. Hence, .

Figure 2: A graph not in : a) , b) partial construction

From the complexity point of view, the -Matching Problem for is polynomial due to the existence of efficient algorithms for the MWMP such as the Hungarian Algorithm. When , Lemma 2 says that graphs from are -free, and the MWSSP for -free graphs is known to be -Hard. Nevertheless, this does not mean that our matching problem is hard since has other forbidden subgraphs. Its complexity is addressed in the next section.

3 Complexity of the problem

In this section, I prove that the -Matching Problem is -hard for . Even more, I consider a more restricted problem where every star of has exactly multiplicity . The decision problem is as follows:

-Matching Decision Problem (-MDP)
INSTANCE such that , such that for all , for all for all such that , for all .
QUESTION: Is there a valid assignment such that ?

Let us first introduce two auxiliary problems. Given , let and be disjoint sets such that . A perfect matching (p.m. for short) is a set such that and every element of occurs in exactly one pair of . The first, which is -complete [4], is defined below:

Disjoint Matchings (DM)
INSTANCE;  disjoint sets , such that .
QUESTION: Are there p.m.  such that ?

The second auxiliary problem is given below. It differs from the 2-Matching Decision Problem in that values do not come from probabilities:

2-Matching Decision Problem with Arbitrary Weights (2-MDPAW)
INSTANCE;  sets such that and such that for all , for all , .
QUESTION: Is there a valid assignment such that ?

2-MDPAW is -complete. First of all, it clearly is . Below, a polynomial transformation from DM is proposed. Consider an instance , , of DM. We construct an instance of 2-MDPAW as follows. Let and

Hence, and . For every , let . For and , let

where denotes the symmetric difference operator between sets. Finally, let .

We prove that, given disjoint p.m. , there exists a valid assignment such that . Consider when for some , and otherwise. The validity of is straightforward. Also, . Conversely, we prove that, for a given valid assignment such that , there exist disjoint p.m. . Consider for all . Since is a function, . It is also straightforward that . Now, suppose that there exists an element in occurring in two pairs of . W.l.o.g., suppose . Then, which is absurd. Therefore, every element in occur at most once in any pair of and once in . It is easy to see that and . Suppose that there exists an element in which does not occur in any pair of . Again, w.l.o.g., suppose that such element does not occur in . Then, and . Absurd! Therefore, and are both p.m. and .

-MDP is -complete for all . We propose a polynomial transformation from 2-MDPAW. Consider an instance , , , , and of 2-MDPAW. We construct an instance of -MDP as follows. Let and . For all , let where (if , we just have ). Take an that maximizes . Let and for all , . Then, . Let for all . We obtain . For all , let and where . Finally, let .

Now we prove that there is an of 2-MDPAW such that if and only if there is an of -MDP such that . In order to be valid, for all . We propose for all . Clearly, if is valid then is valid too, and conversely. Since , .


  • [1] Budavári T. and A. Basu, Probabilistic Cross-Identification in Crowded Fields as an Assignment Problem, Astron. J. 152 (2016), 86B.
  • [2] Severín D. E., and D. J. Sevilla, Development of a new digital version of “Cordoba Durchmusterung” stellar catalog, Revista Académica Electrónica de la U.N.R., 1 (2015), 2250–2260.
  • [3] Severín D. E., Cross-identification between Cordoba Durchmusterung catalog (declinations -22, -23 and -24) and PPMX catalog, Mendeley Data, v1 (2018).
  • [4] Frieze A. M., Complexity of a 3-dimensional assignment problem, Eur. J. Oper. Res. 13 (1983), 161–164.

Online Appendix of “Cross-identification of stellar catalogs with multiple stars: Complexity and Resolution”

Example of a 2-Matching Problem

Consider an instance of the 2-Matching Problem where and . Here, are single stars and are double. Suppose that the first phase yields the following sets:
A scheme that includes probabilities is displayed in Figure 3(a). Here, the optimal assignment is , , , with probability .

The reduction to the MWSSP gives , weights
(letters “” and “” are omitted for the sake of readability), and the graph is shown in Figure 3(b).

Example of the reduction of Lemma 3 and Theorem 3

Consider the instance of DM given in Figure 4(a) where and . The corresponding instance of 2-MDPAW is shown in Figure 4(b) where .

Also, for and the given instance of 2-MDPAW, the corresponding instance of 3-MDP is shown in Figure 5 where . Vertices for all are displayed as unlabeled circles filled with white color.


Here, an exact algorithm for the the -Matching Problem is proposed and the resolution of a cross-identification between two catalogs based on real data is presented.

The algorithm is given below.

  1. For each such that , do the following. If there is an element such that , remove from (if is an optimal assignment then since is a better choice than ).

  2. Generate the graph as stated in Section 2.

  3. Find the connected components of .

  4. For each component of , solve the problem restricted to .

Let and be the stars involved in a component of , i.e.  and . In the last step of our algorithm, three cases can be presented:

  • Unique star. If , then the solution is straightforward: .

  • Only single stars. If and for all , then the problem restricted to can be solved via the Hungarian Algorithm in polynomial time. In that case, the instance of the MWMP is: a bipartite graph such that and , weights for each edge and weights for each edge .

  • Multiple stars. If and there is such that , then it can be solved with an exact algorithm for the MWSSP444See, for instance, S. Rebennack, M. Oswald, D. O. Theis, H. Seitz, G. Reinelt and P. M. Pardalos, A Branch and Cut solver for the maximum stable set problem, J. Comb. Optim. 21 (2011), 434–457.

    . In the case that such algorithm is not available, solving the following integer linear programming formulation is a reasonably fast alternative:

    subject to

    Constraints (1) guarantee that each star of must be assigned to exactly one element of . Constraints (2) forbid that each star of be assigned to two or more stars of . For the sake of readability, the latter constraints are presented for all but one have to keep in mind that some of them can be removed if: (i) the constraint has just one variable in the left hand side, or (ii) it is repeated, i.e. if, for some , there exists another such that and occur exactly in the same tuples of .

An instance of the 2-Matching Problem is obtained once the first phase is completed. Table 1 reports some highlights about the optimization of that instance.

As we can see from the table, is highly decomposable and just 111 integer linear problems needs to be solved. Moreover, these integer problems turned out to be very easy to solve since the solver did not branch (all of them were solved in the root node). The hardest one has 339 variables and 118 constraints, and took 0.0015 seconds of CPU time. The optimization was performed on a computer equipped with an Intel i7-7700 at 3.60 Ghz and GuRoBi 6.5.2 as the MIP solver. The overall process took 41.6 seconds of CPU time.

Description of the first phase

This section is devoted to present a summary on how to obtain a set of candidate stars for a given star of the former catalog and the probabilities involved in them. Recall that such computations heavily depends on structure and data availability of both catalogs as well as the underlying physical model used to establish the relationship between them. It is beyond the scope of this work to analyze such scenarios neither to give a formal treatment, so a simplified555Stars of both catalogs should not be near the celestial poles in order to avoid certain distortions, and stars with high variability in its brightness should be avoided. This can be done by pre-identifying them and remove them from both catalogs. but reasonable model is considered, which is enough for presenting our approach666A more robust and general probabilistic model is discussed in T. Budavári and A. S. Szalay, Probabilistic Cross-Identification of Astronomical Sources, Astrophys. J. 679 (2008), 301–309..

Consider catalogs and , and let be the set of stars from catalog marked as “double”. Our goal is to propose an instance of the 2-Matching Problem.

Let us first present some basic elements of Positional Astronomy. Usually, position is given in a well established reference frame where two spherical coordinates are used: right ascension denoted by and declination denoted by , similar to longitude and latitude coordinates on Earth. In fact, a pair represents a point in the unit sphere. For a given two points , denote its angular distance by . A known property is that, if points have the same right ascension, is given by the difference in its declinations. However, if have the same declination, depends on the difference in right ascensions and the cosine of the declination of both points. For this reason, it is convenient to work with the quantity instead of directly.

Catalogs usually give the right ascension , declination and visual magnitude

(a measure of brightness) of each star. These parameters are modeled as a multivariate normal distribution. However, in several catalogs, each parameter is considered independent from each other. Therefore, for a given star we have

, , , where , and are the expected values of the parameters and , and

its standard errors.

Positions provided in a catalog are valid for a certain epoch

, which is a specific moment in time. However, there exist transformations for translating positions from one epoch to other such as precession and nutation. In addition, stars have its own apparent motion across the sky denominated

proper motion. Some catalogs also provide additional coefficients for computing the correction in proper motion. These coefficients have its own standard errors. Therefore, it is possible to compute the positions and its uncertainties of a star for a new epoch by means of the mentioned transformations and the propagation of the error777Details of these transformations are treated in J. Kovalevsky and P. K. Seidelmann, Fundamentals of Astrometry, Cambridge University Press, UK, 2004.. This is the case of the catalog PPMX888See S. Roeser, E. Schilbach, H. Schwan, N. V. Kharchenko, A. E. Piskunov and R.-D. Scholz, PPM-Extended (PPMX), a catalogue of positions and proper motions, Astron. Astrophys. 488 (2008), 401–408. where position for epoch , brightness, proper motions and its uncertainties are available, among others parameters.

Naturally, older catalogs handle less information. For instance, the Cordoba Durchmusterung (CD) does not report standard errors for each star, but a mean standard error over several stars from the same region of the sky999See pages XXIX-XXX of J. M. Thome, Cordoba Durchmusterung (-22 to -32), Resultados del Observatorio Nacional Argentino 16 (1892)., e.g. for stars whose declinations are between and we have and .

Some extra parameters must be determined before performing the cross-identification. Therefore, the input of our problem consists of catalogs , and these extra parameters. They will be introduced thoughout this section.

Treatment of single stars. Let and . Observe that, if and are far from each other, it makes little sense that both represent the same star. Usually, a criterion based on the angular distance between them can be used to keep those “close” pairs. Consider a candidate for to every star such that where is a given threshold. Hence, let us define

Note that the set is added to since it could happen that a star of catalog has no counterpart in .

Let and , with its corresponding values , , , , , and , , , , , respectively. A way to measure the probability that and

are the same star is through the distribution of the 3-dimensional random vector

, which is known that it behaves as a multivariate normal distribution whose probability density function is


and is the well known probability density function of . Now, define the probability that corresponds to some as follows:


is an estimate of the probability that a star from

does not have counterpart in (usually very low).

This treatment generalizes the criterion based on the “normalized distance”101010See, for instance, W. Sutherland and W. Saunders, On the likelihood ratio for source identification, Mon. Not. R. Astron. Soc. 259 (1992), 413–420. for assigning stars from to , that is to assign and in a way that

is minimized, where and are the lengths of the axes of the error ellipse: If , , is almost zero, , and for all and , visual magnitudes are not considered (i.e.  and for all and ) and is an optimal assignment then is a minimum of . Note that, for each , since for all and . Let . Then,